<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Damaso Sanoja</title>
    <description>The latest articles on DEV Community by Damaso Sanoja (@damasosanoja).</description>
    <link>https://dev.to/damasosanoja</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F346479%2F3b8ceb9d-fe63-4052-8d28-4728bceb7111.jpeg</url>
      <title>DEV Community: Damaso Sanoja</title>
      <link>https://dev.to/damasosanoja</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/damasosanoja"/>
    <language>en</language>
    <item>
      <title>Database Maintenance: Tracing Production Incidents to Their Root Cause</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Fri, 22 May 2026 12:00:38 +0000</pubDate>
      <link>https://dev.to/damasosanoja/database-maintenance-tracing-production-incidents-to-their-root-cause-327e</link>
      <guid>https://dev.to/damasosanoja/database-maintenance-tracing-production-incidents-to-their-root-cause-327e</guid>
      <description>&lt;p&gt;Database maintenance fails when it runs on a calendar instead of on signal. &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver17" rel="noopener noreferrer"&gt;Fragmentation, stale statistics, log growth, and lock contention are functions of write workload&lt;/a&gt;, not weekly schedules. Scheduled maintenance skips the tables that need it most, and the resulting incident fires before anyone notices the gap.&lt;/p&gt;

&lt;p&gt;This article replaces the cron job with a response system. Four observable symptoms (I/O degradation, query plan regression, storage pressure, and lock contention) each trace back to a specific maintenance root cause, with fixes for SQL Server, PostgreSQL, and MySQL. Silent corruption, the one failure mode that produces no precursor signal, gets its own detection-first treatment. A closing scorecard lets you self-assess.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Response: Wait State Triage Across Engines
&lt;/h2&gt;

&lt;p&gt;When a slow query alert fires, the first diagnostic step is the same regardless of engine: check what the query is waiting on. &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql" rel="noopener noreferrer"&gt;Wait states&lt;/a&gt; are the universal entry point for database incident triage. They tell you whether the problem is I/O bound, lock bound, or CPU bound, and that classification determines which section of this article contains your fix.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzq3zyo1ffbfil2g78y4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzq3zyo1ffbfil2g78y4.png" alt="Wait State Triage" width="800" height="709"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL Server wait types
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.sqlskills.com/help/waits/pageiolatch_sh/" rel="noopener noreferrer"&gt;&lt;code&gt;PAGEIOLATCH_SH&lt;/code&gt;&lt;/a&gt; means the query is waiting for data pages to be read from disk into the buffer pool. This points to index fragmentation, &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql" rel="noopener noreferrer"&gt;buffer cache pressure&lt;/a&gt;, or storage subsystem saturation. &lt;a href="https://www.sqlskills.com/help/waits/lck_m_s/" rel="noopener noreferrer"&gt;&lt;code&gt;LCK_M_S&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://www.sqlskills.com/help/waits/lck_m_x/" rel="noopener noreferrer"&gt;&lt;code&gt;LCK_M_X&lt;/code&gt;&lt;/a&gt; indicate row or table-level lock contention from a concurrent transaction or a maintenance operation holding locks. &lt;a href="https://sqlperformance.com/2015/08/sql-performance/more-on-cxpacket-waits-skewed-parallelism" rel="noopener noreferrer"&gt;&lt;code&gt;CXPACKET&lt;/code&gt;&lt;/a&gt; (visible in &lt;code&gt;sys.dm_exec_requests&lt;/code&gt;) signals parallelism skew, which typically traces to stale statistics or a missing index causing the optimizer to choose an expensive parallel plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  PostgreSQL and MySQL equivalents
&lt;/h3&gt;

&lt;p&gt;PostgreSQL exposes wait diagnostics through &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_activity&lt;/code&gt;&lt;/a&gt;. The query below is your triage entry point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: active session wait events&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;wait_event&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'idle'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;backend_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'client backend'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The diagram above maps each value to its target section. One non-obvious case is worth calling out: a &lt;code&gt;NULL&lt;/code&gt; &lt;code&gt;wait_event&lt;/code&gt; while &lt;code&gt;state = 'active'&lt;/code&gt; indicates the query is compute-bound (the PostgreSQL equivalent of CPU pressure), which can point toward stale statistics or a plan regression rather than I/O.&lt;/p&gt;

&lt;p&gt;For MySQL, &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/performance-schema-wait-tables.html" rel="noopener noreferrer"&gt;&lt;code&gt;performance_schema.events_waits_current&lt;/code&gt;&lt;/a&gt; is the source for the values shown in the diagram. Verify &lt;code&gt;performance_schema = ON&lt;/code&gt; in &lt;code&gt;my.cnf&lt;/code&gt; first, as it is disabled by default in some MySQL 5.x builds and carries non-zero overhead; on MySQL 8.0+ it is enabled by default. &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/show-processlist.html" rel="noopener noreferrer"&gt;&lt;code&gt;SHOW PROCESSLIST&lt;/code&gt;&lt;/a&gt; gives a quicker but less granular view.&lt;/p&gt;

&lt;p&gt;Once you have identified the wait type, the sections below trace each category to its maintenance root cause and prescribe the fix. For hybrid topologies that span on-prem and cloud-managed instances, &lt;a href="https://www.manageengine.com/it-operations-management/database-monitoring.html" rel="noopener noreferrer"&gt;ManageEngine OpManager Nexus&lt;/a&gt; surfaces wait-state and slow-query data across both in a single triage view through its &lt;a href="https://www.site24x7.com/help/database-monitoring/" rel="noopener noreferrer"&gt;SaaS delivery for managed databases&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Symptom: I/O Degradation and Read Amplification
&lt;/h2&gt;

&lt;p&gt;A buffer cache hit ratio drifting below the 95-99% range that healthy OLTP workloads maintain is the cross-engine signal that the engine is reading more pages from disk than memory can satisfy. &lt;/p&gt;

&lt;p&gt;SQL Server practitioners typically treat 90% as a warning and 85% as an action threshold; PostgreSQL and MySQL expose equivalents in &lt;code&gt;pg_statio_user_tables&lt;/code&gt; and &lt;code&gt;information_schema.INNODB_BUFFER_POOL_STATS&lt;/code&gt; (or &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt;). The most common cause is index fragmentation: pages split, B-tree leaves scatter across non-contiguous extents, and one logical read becomes several physical I/Os. Read amplification surfaces as &lt;code&gt;PAGEIOLATCH&lt;/code&gt; waits on SQL Server, &lt;code&gt;DataFileRead&lt;/code&gt; on PostgreSQL, and elevated &lt;code&gt;innodb_data_file&lt;/code&gt; waits on MySQL. &lt;/p&gt;

&lt;p&gt;On cloud-managed instances where DMV access is restricted (RDS, Azure SQL Managed Instance), OpManager Nexus's SaaS delivery surfaces the same buffer-pool visibility through its agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diagnosing index bloat
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SQL Server:&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-physical-stats-transact-sql" rel="noopener noreferrer"&gt;&lt;code&gt;sys.dm_db_index_physical_stats&lt;/code&gt;&lt;/a&gt; is the authoritative source for fragmentation data. The query below returns indexes above 5% fragmentation with more than 1,000 pages (the &lt;a href="https://www.brentozar.com/archive/2009/02/index-fragmentation-findings-part-2-size-matters/" rel="noopener noreferrer"&gt;page count filter matters&lt;/a&gt; because rebuilding very small indexes produces negligible performance improvement):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;OBJECT_NAME&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tbl_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;idx_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_type_desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_fragmentation_in_percent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_db_index_physical_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;DB_ID&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'LIMITED'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ips&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object_id&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_fragmentation_in_percent&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ips&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_fragmentation_in_percent&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;'LIMITED'&lt;/code&gt; scan mode traverses only the index allocation structure, making it safe and fast on production systems. &lt;code&gt;'SAMPLED'&lt;/code&gt; reads a statistical sample of data pages for more accurate numbers at moderate I/O cost on very large tables or partitioned indexes. &lt;code&gt;'DETAILED'&lt;/code&gt; performs a full scan; reserve it for offline assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL:&lt;/strong&gt; The &lt;code&gt;pg_stat_user_tables&lt;/code&gt; view provides the first signal. A &lt;code&gt;dead_pct&lt;/code&gt; above 10-20% on a high-write table is a common trigger for manual VACUUM (this range aligns with practitioner guidance, with the autovacuum default kicking in at 20%):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;n_dead_tup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;n_live_tup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_dead_tup&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_live_tup&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;n_dead_tup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dead_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;last_vacuum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;last_autovacuum&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_user_tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;n_live_tup&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;n_dead_tup&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For index-level bloat (physical B-tree bloat that VACUUM does not reclaim), the &lt;code&gt;pgstattuple&lt;/code&gt; extension exposes two functions. &lt;code&gt;pgstattuple()&lt;/code&gt; returns &lt;code&gt;free_percent&lt;/code&gt;, the wasted-space ratio that is the PostgreSQL equivalent of &lt;code&gt;avg_fragmentation_in_percent&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;pgstattuple&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pgstattuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'orders_created_at_idx'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;pgstatindex()&lt;/code&gt; returns the B-tree-specific metrics: &lt;code&gt;leaf_fragmentation&lt;/code&gt; (percentage of leaf pages not in logical order, indicating physical scatter) and &lt;code&gt;avg_leaf_density&lt;/code&gt; (below 50% suggests the index has many near-empty pages):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pgstatindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'orders_created_at_idx'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both functions perform a full scan of the target relation, so on a multi-hundred-GB index expect runtime and I/O comparable to a sequential read of the entire object — schedule them like any other heavy diagnostic, not in a hot loop.&lt;/p&gt;

&lt;p&gt;High &lt;code&gt;free_percent&lt;/code&gt; with low &lt;code&gt;leaf_fragmentation&lt;/code&gt; may indicate space reclaimable by VACUUM rather than a full rebuild. Values of &lt;code&gt;free_percent&lt;/code&gt; in the 20-30% range are a &lt;a href="https://aws.amazon.com/blogs/database/improve-postgresql-performance-using-the-pgstattuple-extension/" rel="noopener noreferrer"&gt;widely used trigger for REINDEX&lt;/a&gt;; consult your workload and current community guidance to calibrate the threshold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MySQL:&lt;/strong&gt; Query &lt;code&gt;information_schema.TABLES&lt;/code&gt; for InnoDB tablespace fragmentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;table_schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;data_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_free&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;free_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_free&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_length&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;index_length&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;data_free&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;frag_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TABLES&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'InnoDB'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;data_free&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;data_free&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This metric is meaningful only with per-table tablespaces (&lt;code&gt;innodb_file_per_table = ON&lt;/code&gt;, the default since MySQL 5.6); on shared-tablespace deployments, &lt;code&gt;data_free&lt;/code&gt; reflects unused space in the global &lt;code&gt;ibdata&lt;/code&gt; file and is repeated identically across every InnoDB row.&lt;/p&gt;

&lt;p&gt;Tables with &lt;code&gt;frag_pct&lt;/code&gt; above 20% are commonly treated as candidates for &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html" rel="noopener noreferrer"&gt;&lt;code&gt;OPTIMIZE TABLE&lt;/code&gt;&lt;/a&gt; or &lt;code&gt;pt-online-schema-change&lt;/code&gt; (this threshold is a practitioner guideline rather than a MySQL-documented limit).&lt;/p&gt;

&lt;h3&gt;
  
  
  Remediation by engine and downtime tolerance
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes" rel="noopener noreferrer"&gt;Microsoft's documentation on index reorganization and rebuild&lt;/a&gt; maps fragmentation levels to two SQL Server operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5-30% fragmentation:&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-index-transact-sql" rel="noopener noreferrer"&gt;&lt;code&gt;ALTER INDEX idx_name ON tbl_name REORGANIZE&lt;/code&gt;&lt;/a&gt; compacts leaf-level pages incrementally as an online operation. It can be interrupted mid-run without corrupting the index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Above 30%:&lt;/strong&gt; &lt;a href="https://www.mssqltips.com/sqlservertip/8063/sql-index-rebuild-vs-reorganize-comparison/" rel="noopener noreferrer"&gt;&lt;code&gt;ALTER INDEX idx_name ON tbl_name REBUILD&lt;/code&gt;&lt;/a&gt; recreates the index. Offline by default (acquires a schema modification lock that &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/indexes/guidelines-for-online-index-operations" rel="noopener noreferrer"&gt;blocks concurrent access&lt;/a&gt;). Add &lt;code&gt;WITH (ONLINE = ON)&lt;/code&gt; on Enterprise edition to keep the index available during the rebuild. Note that even online rebuilds acquire a brief Schema Modification (Sch-M) lock at the beginning and end of the operation, typically milliseconds, but long enough to cause noticeable waits on extremely high-concurrency workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On SQL Server 2017+, combine &lt;code&gt;ONLINE = ON&lt;/code&gt; with &lt;code&gt;RESUMABLE = ON&lt;/code&gt; and a configurable &lt;code&gt;MAX_DURATION&lt;/code&gt; to pause and resume long rebuilds: &lt;code&gt;ALTER INDEX idx_name ON tbl_name REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAX_DURATION = 60)&lt;/code&gt;. Resume with &lt;code&gt;ALTER INDEX idx_name ON tbl_name REBUILD WITH (RESUME)&lt;/code&gt;. &lt;code&gt;RESUMABLE = ON&lt;/code&gt; requires &lt;code&gt;ONLINE = ON&lt;/code&gt; and is Enterprise-edition-only on SQL Server 2017; SQL Server 2019+ also enables it on Standard and Web editions, so verify your edition before scripting against this syntax.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.sqlskills.com/blogs/paul/where-do-the-books-online-index-fragmentation-thresholds-come-from/" rel="noopener noreferrer"&gt;5% floor matters equally&lt;/a&gt;. Running REORGANIZE on a 3% fragmented index generates log activity, consumes I/O, and produces no measurable query improvement.&lt;/p&gt;

&lt;p&gt;For PostgreSQL, &lt;a href="https://www.postgresql.org/docs/current/sql-vacuum.html" rel="noopener noreferrer"&gt;&lt;code&gt;VACUUM&lt;/code&gt;&lt;/a&gt; reclaims dead tuple storage and updates the visibility map. &lt;a href="https://www.postgresql.org/docs/current/sql-analyze.html" rel="noopener noreferrer"&gt;&lt;code&gt;ANALYZE&lt;/code&gt;&lt;/a&gt; updates planner statistics. &lt;a href="https://www.postgresql.org/docs/current/sql-reindex.html" rel="noopener noreferrer"&gt;&lt;code&gt;REINDEX&lt;/code&gt;&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/postgresql-maintenance-rds-aurora/reindex.html" rel="noopener noreferrer"&gt;rebuilds the B-tree structure&lt;/a&gt; when physical index bloat is confirmed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;VERBOSE&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Blocking rebuild (requires maintenance window):&lt;/span&gt;
&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;transactions_created_at_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Non-blocking rebuild (PostgreSQL 12+):&lt;/span&gt;
&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;transactions_created_at_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;REINDEX CONCURRENTLY&lt;/code&gt; cannot run inside a transaction block and takes longer than the standard form, but it allows writes to continue during the rebuild. Beyond immediate remediation, &lt;code&gt;VACUUM VERBOSE&lt;/code&gt; output is worth reviewing regularly on your heaviest-write tables. It provides dead tuple counts, page recycling data, and cleanup statistics that give indirect signals of table health. PostgreSQL's &lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html" rel="noopener noreferrer"&gt;autovacuum handles routine dead tuple cleanup&lt;/a&gt; automatically, but under high-velocity delete workloads it can fall behind. The official PostgreSQL documentation on routine vacuuming covers tuning &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; and &lt;code&gt;autovacuum_vacuum_threshold&lt;/code&gt; for tables where the defaults prove too conservative.&lt;/p&gt;

&lt;p&gt;For MySQL, &lt;code&gt;OPTIMIZE TABLE&lt;/code&gt; defragments the tablespace and rebuilds statistics in a single operation. In MySQL 8.0+, this runs online for regular InnoDB tables with only brief metadata locks at prepare and commit phases, but the full copy can take significant time on large tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Internally, InnoDB maps &lt;code&gt;OPTIMIZE TABLE&lt;/code&gt; to &lt;code&gt;ALTER TABLE ... FORCE&lt;/code&gt;, rebuilding the clustered index and all secondary indexes. For zero-downtime execution on large tables, &lt;a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html" rel="noopener noreferrer"&gt;&lt;code&gt;pt-online-schema-change&lt;/code&gt;&lt;/a&gt; from Percona Toolkit performs the same rebuild while keeping the original table live:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pt-online-schema-change &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--alter&lt;/span&gt; &lt;span class="s2"&gt;"ENGINE=InnoDB"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--execute&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;D&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;app_prod,t&lt;span class="o"&gt;=&lt;/span&gt;events,h&lt;span class="o"&gt;=&lt;/span&gt;127.0.0.1,F&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;/.my.cnf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This maintains a shadow copy and replays writes via triggers throughout the rebuild. The &lt;code&gt;--execute&lt;/code&gt; flag is required; without it the tool runs in dry-run mode only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remediation lookup by symptom severity:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom Severity&lt;/th&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Downtime Tolerance&lt;/th&gt;
&lt;th&gt;Recommended Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mild (frag &amp;lt; 5% / dead_pct &amp;lt; 10%)&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moderate (5-30%)&lt;/td&gt;
&lt;td&gt;SQL Server&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;ALTER INDEX ... REORGANIZE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Severe (&amp;gt; 30%)&lt;/td&gt;
&lt;td&gt;SQL Server&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;ALTER INDEX ... REBUILD WITH (ONLINE=ON) [Enterprise]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Severe (&amp;gt; 30%)&lt;/td&gt;
&lt;td&gt;SQL Server&lt;/td&gt;
&lt;td&gt;Available&lt;/td&gt;
&lt;td&gt;ALTER INDEX ... REBUILD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elevated (dead_pct &amp;gt; 10%)&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;VACUUM ANALYZE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High bloat (free_percent &amp;gt; 30%)&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;REINDEX CONCURRENTLY&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elevated (frag_pct &amp;gt; 20%)&lt;/td&gt;
&lt;td&gt;MySQL&lt;/td&gt;
&lt;td&gt;Available&lt;/td&gt;
&lt;td&gt;OPTIMIZE TABLE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elevated (frag_pct &amp;gt; 20%)&lt;/td&gt;
&lt;td&gt;MySQL&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;pt-online-schema-change&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With fragmentation addressed, the next failure category that produces slow queries is stale statistics, which causes the optimizer to choose a scan where an index seek would be orders of magnitude faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Symptom: Query Plan Regression
&lt;/h2&gt;

&lt;p&gt;The execution plan shows a table scan where an index seek ran yesterday. The optimizer has not changed; the data it relies on has. This is a statistics problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diagnosing stale statistics
&lt;/h3&gt;

&lt;p&gt;The SQL Server &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver17" rel="noopener noreferrer"&gt;optimizer uses row count estimates and data distribution histograms&lt;/a&gt; to choose between index seeks and table scans. When those statistics are weeks out of date on a fast-growing table, the optimizer picks a scan where a seek would be dramatically faster. Run &lt;code&gt;UPDATE STATISTICS table_name WITH FULLSCAN&lt;/code&gt; on any table that receives large batch loads. The &lt;a href="https://learn.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql?view=sql-server-ver17" rel="noopener noreferrer"&gt;&lt;code&gt;WITH SAMPLE&lt;/code&gt;&lt;/a&gt; variant uses a row sampling percentage that can miss skewed distributions on large tables, producing statistics that look current but reflect an unrepresentative subset.&lt;/p&gt;

&lt;p&gt;To detect indexes suffering from stale statistics or poor plan choices, query &lt;code&gt;sys.dm_db_index_usage_stats&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;OBJECT_NAME&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;object_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tbl_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;index_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;user_seeks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;user_scans&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;user_lookups&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_db_index_usage_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;database_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DB_ID&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_scans&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Indexes with zero seeks but high scans are candidates for statistics updates or missing index evaluation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/planner-stats.html" rel="noopener noreferrer"&gt;PostgreSQL's &lt;code&gt;ANALYZE&lt;/code&gt; command&lt;/a&gt; and &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/analyze-table.html" rel="noopener noreferrer"&gt;MySQL's &lt;code&gt;ANALYZE TABLE&lt;/code&gt;&lt;/a&gt; update planner statistics independently from &lt;code&gt;VACUUM&lt;/code&gt; and &lt;code&gt;OPTIMIZE TABLE&lt;/code&gt; respectively. On PostgreSQL, autovacuum runs &lt;code&gt;ANALYZE&lt;/code&gt; automatically after a &lt;a href="https://www.postgresql.org/docs/17/runtime-config-autovacuum.html" rel="noopener noreferrer"&gt;configurable percentage of rows change&lt;/a&gt; (controlled by &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt;, default 0.1 or 10%), but that default is &lt;a href="https://aws.amazon.com/blogs/database/understanding-autovacuum-in-amazon-rds-for-postgresql-environments/" rel="noopener noreferrer"&gt;too high for large tables&lt;/a&gt;. A 200-million-row table would need 20 million row changes to trigger autovacuum's ANALYZE pass, by which point the query plan may have been wrong for hours. Lowering &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; to 0.01 or using &lt;code&gt;autovacuum_analyze_threshold&lt;/code&gt; with per-table overrides addresses this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Updating statistics without disruption
&lt;/h3&gt;

&lt;p&gt;On SQL Server, &lt;code&gt;UPDATE STATISTICS&lt;/code&gt; generally does not block queries (it runs with NOLOCK semantics on data reads), though asynchronous statistics updates can cause brief schema lock contention during query compilation in high-workload scenarios. It does invalidate cached execution plans for the affected table: immediately after, SQL Server will recompile plans on next execution, which can briefly spike CPU on systems with many concurrent queries against the updated table. Run during low-traffic windows on heavily queried tables. The choice between &lt;code&gt;FULLSCAN&lt;/code&gt; and &lt;code&gt;SAMPLE&lt;/code&gt; depends on table size and distribution skew. &lt;/p&gt;

&lt;p&gt;For tables in the small-to-medium range, &lt;code&gt;FULLSCAN&lt;/code&gt; typically completes quickly enough to run during off-peak hours (the practical upper bound depends on hardware, but many teams use roughly 100M rows as a rule-of-thumb cutoff). For larger tables, a higher sample percentage (such as &lt;code&gt;SAMPLE 20 PERCENT&lt;/code&gt; or &lt;code&gt;SAMPLE 30 PERCENT&lt;/code&gt;) typically provides a better tradeoff between accuracy and duration than the default sample, though the optimal percentage varies by workload.&lt;/p&gt;

&lt;p&gt;On PostgreSQL, &lt;a href="https://www.crunchydata.com/blog/indexes-selectivity-and-statistics" rel="noopener noreferrer"&gt;&lt;code&gt;ANALYZE&lt;/code&gt;&lt;/a&gt; reads a configurable sample (default &lt;code&gt;default_statistics_target = 100&lt;/code&gt;, meaning 30,000 rows per column) and does not lock the table. Run it manually after any bulk load or partition swap.&lt;/p&gt;

&lt;p&gt;On MySQL, &lt;code&gt;ANALYZE TABLE&lt;/code&gt; is a lightweight operation on InnoDB that reads the index tree's random dive samples. It is a fast operation: in MySQL 8.0+, &lt;code&gt;ANALYZE TABLE&lt;/code&gt; uses online DDL semantics, avoiding the full read lock that earlier versions required. Capture &lt;code&gt;EXPLAIN&lt;/code&gt; for representative queries before and after to confirm the planner picked up the new statistics.&lt;/p&gt;

&lt;p&gt;OpManager Nexus automates detection of query plan regression on-prem through historical baseline comparison and anomaly flagging. The same capability extends to cloud-managed databases through its SaaS delivery, where &lt;a href="https://www.site24x7.com/database-monitoring.html" rel="noopener noreferrer"&gt;slow query log analysis&lt;/a&gt; drills into queries exceeding a configurable execution-time threshold. The Automated Remediation section below covers how to wire that detection into corrective workflows.&lt;/p&gt;

&lt;p&gt;Statistics failures are invisible until the query plan degrades. Storage failures are equally silent, until a disk fills and takes the database offline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Symptom: Storage Pressure and Runaway Growth
&lt;/h2&gt;

&lt;p&gt;A disk usage alert fires at 85% capacity. The database server has been running for months without anyone checking how fast the log files or tablespaces are growing. The root cause splits into two categories: unmanaged transaction log growth and missing archiving strategy. Both are maintenance failures that monitoring should have caught weeks earlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transaction log and WAL management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SQL Server:&lt;/strong&gt; A &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/logs/troubleshoot-a-full-transaction-log-sql-server-error-9002?view=sql-server-ver17" rel="noopener noreferrer"&gt;full recovery model database without regular transaction log backups&lt;/a&gt; will grow its log file until the disk fills, and a full data volume is an immediate production outage. To check current log space usage across all databases, run &lt;code&gt;DBCC SQLPERF(LOGSPACE);&lt;/code&gt;, which returns log size, space used percentage, and status for every database. For a single database, query &lt;code&gt;sys.databases&lt;/code&gt; for the &lt;code&gt;log_reuse_wait_desc&lt;/code&gt; column, which tells you exactly why the log cannot be truncated (e.g., &lt;code&gt;LOG_BACKUP&lt;/code&gt;, &lt;code&gt;ACTIVE_TRANSACTION&lt;/code&gt;). Schedule log backups at an interval matching your Recovery Point Objective (RPO): for most OLTP workloads, intervals in the range of 5-30 minutes are commonly used, with tighter intervals for high-transaction systems, though the right frequency is workload-specific.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-shrinkfile-transact-sql?view=sql-server-ver17" rel="noopener noreferrer"&gt;&lt;code&gt;DBCC SHRINKFILE&lt;/code&gt;&lt;/a&gt; on the log file is a last resort for reclaiming space after an unexpected log growth event. The reason it is a last resort, rather than a routine cleanup tool, is the side effect on &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/logs/manage-the-size-of-the-transaction-log-file?view=sql-server-ver17" rel="noopener noreferrer"&gt;Virtual Log Files&lt;/a&gt; (VLFs), the internal segments SQL Server divides the transaction log into. Each shrink-then-regrow cycle adds a new VLF, so a log that has been shrunk repeatedly ends up fragmented into many small VLFs instead of a few large ones. That fragmentation degrades sequential log write throughput and &lt;a href="https://www.sqlskills.com/blogs/paul/why-you-should-not-shrink-your-data-files/" rel="noopener noreferrer"&gt;increases recovery time&lt;/a&gt;. The fix is to address the root cause (missing log backups, long-running transactions) rather than shrinking on a schedule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL:&lt;/strong&gt; &lt;a href="https://www.postgresql.org/docs/current/continuous-archiving.html" rel="noopener noreferrer"&gt;WAL (Write-Ahead Log)&lt;/a&gt; management serves the same function as SQL Server's transaction log. The &lt;code&gt;archive_mode&lt;/code&gt; and &lt;code&gt;archive_command&lt;/code&gt; settings control whether completed WAL segments are shipped to archive storage. &lt;a href="https://www.percona.com/blog/five-reasons-why-wal-segments-accumulate-in-the-pg_wal-directory-in-postgresql/" rel="noopener noreferrer"&gt;Without archiving enabled, WAL segments accumulate&lt;/a&gt; in &lt;code&gt;pg_wal/&lt;/code&gt; until disk fills. The &lt;a href="https://pgpedia.info/w/wal_keep_size.html" rel="noopener noreferrer"&gt;&lt;code&gt;wal_keep_size&lt;/code&gt;&lt;/a&gt; parameter (PostgreSQL 13+, replacing &lt;code&gt;wal_keep_segments&lt;/code&gt;) sets a floor for retained WAL data, but does not cap growth. For production systems, configure continuous archiving with &lt;code&gt;archive_mode = on&lt;/code&gt; and point &lt;code&gt;archive_command&lt;/code&gt; to your backup infrastructure (pgBackRest, Barman, or cloud-native equivalents).&lt;/p&gt;

&lt;p&gt;To verify archiving is active and current: &lt;code&gt;SELECT * FROM pg_stat_archiver;&lt;/code&gt; Check &lt;code&gt;last_archived_wal&lt;/code&gt; timestamp and &lt;code&gt;failed_count&lt;/code&gt;. A non-zero &lt;code&gt;failed_count&lt;/code&gt; or a stale &lt;code&gt;last_archived_time&lt;/code&gt; means WAL segments are accumulating. Also: &lt;code&gt;SELECT count(*), pg_size_pretty(sum(size)) FROM pg_ls_waldir();&lt;/code&gt; (PostgreSQL 10+) shows total WAL directory size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MySQL:&lt;/strong&gt; &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html" rel="noopener noreferrer"&gt;Binary logs&lt;/a&gt; (binlogs) serve replication and point-in-time recovery. Without rotation, they grow indefinitely. &lt;a href="https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-3.html" rel="noopener noreferrer"&gt;&lt;code&gt;expire_logs_days&lt;/code&gt;&lt;/a&gt; (deprecated in MySQL 8.0.3) or &lt;code&gt;binlog_expire_logs_seconds&lt;/code&gt; (MySQL 8.0+) controls automatic purge. Setting &lt;code&gt;binlog_expire_logs_seconds = 604800&lt;/code&gt; retains seven days of binary logs, which is sufficient for most replication topologies. Run &lt;code&gt;PURGE BINARY LOGS BEFORE NOW() - INTERVAL 7 DAY&lt;/code&gt; for one-time cleanup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capacity forecasting with OpManager Nexus
&lt;/h3&gt;

&lt;p&gt;Reacting to a disk alert at 85% leaves little room for planned action. OpManager Nexus's &lt;a href="https://www.manageengine.com/network-monitoring/help/forecast-reports.html" rel="noopener noreferrer"&gt;AI/ML-based storage forecasting&lt;/a&gt; uses up to 14 days of history to predict when &lt;a href="https://www.manageengine.com/network-monitoring/storage-capacity-forecasting-planning.html" rel="noopener noreferrer"&gt;storage will hit 80%, 90%, and 100%&lt;/a&gt;, giving your team a "disk full in N days" signal once it has at least 3 days of data. Its &lt;a href="https://www.manageengine.com/network-monitoring/help/adaptive-thresholds.html" rel="noopener noreferrer"&gt;adaptive thresholds&lt;/a&gt; learn baseline behavior so alerts fire on genuine anomalies rather than every batch job, and the Database Tab surfaces individual database size, data and log file utilization, and growth trends.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; OpManager Nexus's own monitoring data retention (configured under Settings &amp;gt; General Settings &amp;gt; &lt;a href="https://www.manageengine.com/network-monitoring/help/database-archiving-and-maintenance.html" rel="noopener noreferrer"&gt;Database Maintenance&lt;/a&gt;) is independent of your production database storage. Defaults are 7, 30, and 365 days for detailed, hourly, and daily statistics.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Use OpManager Nexus's forecast reports to verify your archiving cadence keeps pace with growth: if the forecast shows 80% capacity in 30 days but your archive job runs monthly, increase frequency or provision more storage.&lt;/p&gt;

&lt;p&gt;Storage pressure is a passive failure that accumulates over time. Lock contention is an active failure: the maintenance operation meant to fix the database becomes the source of the incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Symptom: Lock Contention from Maintenance Operations
&lt;/h2&gt;

&lt;p&gt;A spike in blocked sessions immediately after a scheduled maintenance run is direct evidence that the REBUILD or REORGANIZE collided with production traffic and &lt;a href="https://www.mssqltips.com/sqlservertip/5880/why-is-index-reorganize-and-update-statistics-causing-sql-server-blocking/" rel="noopener noreferrer"&gt;created lock contention&lt;/a&gt;. The maintenance job is supposed to fix performance, but index REBUILDs running without &lt;code&gt;ONLINE = ON&lt;/code&gt; during peak traffic or without a maintenance window hold locks that block concurrent queries, turning the fix into the incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identifying maintenance-induced blocking
&lt;/h3&gt;

&lt;p&gt;Correlating maintenance timing with OpManager Nexus's Sessions Tab is how you distinguish maintenance-induced blocking from application-level contention. If blocked session counts spike within minutes of a maintenance window opening, the maintenance job is the cause. On SQL Server, check &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-requests-transact-sql" rel="noopener noreferrer"&gt;&lt;code&gt;sys.dm_exec_requests&lt;/code&gt;&lt;/a&gt; for sessions with &lt;code&gt;wait_type&lt;/code&gt; values starting with &lt;code&gt;LCK_M_*&lt;/code&gt;, then look up the head-of-chain blocker and inspect its &lt;code&gt;command&lt;/code&gt; column for &lt;code&gt;ALTER INDEX&lt;/code&gt; or &lt;code&gt;DBCC&lt;/code&gt; operations.&lt;/p&gt;

&lt;p&gt;On PostgreSQL, &lt;code&gt;pg_stat_activity&lt;/code&gt; shows &lt;code&gt;Lock&lt;/code&gt; wait events with &lt;code&gt;wait_event&lt;/code&gt; values like &lt;code&gt;relation&lt;/code&gt; or &lt;code&gt;transactionid&lt;/code&gt;. If the blocking PID is running &lt;code&gt;REINDEX&lt;/code&gt; or &lt;code&gt;VACUUM FULL&lt;/code&gt;, that is maintenance-induced contention. For cloud-managed instances where Sessions Tab access is unavailable, OpManager Nexus's &lt;a href="https://www.site24x7.com/database-monitoring.html" rel="noopener noreferrer"&gt;SaaS delivery&lt;/a&gt; surfaces lock contention and blocking session counts on its database performance dashboard for the same triage signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Online and resumable operations
&lt;/h3&gt;

&lt;p&gt;The fix is operational: use online operations and schedule them outside peak traffic windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Server:&lt;/strong&gt; Use &lt;code&gt;ALTER INDEX ... REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAX_DURATION = 60)&lt;/code&gt; as described in the I/O Degradation section. The duration is any positive integer in minutes; set it based on your maintenance window. &lt;code&gt;REORGANIZE&lt;/code&gt; is always online and interruptible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL:&lt;/strong&gt; &lt;code&gt;REINDEX INDEX CONCURRENTLY&lt;/code&gt; (introduced in the I/O Degradation section) avoids exclusive locks. &lt;code&gt;VACUUM&lt;/code&gt; without &lt;code&gt;FULL&lt;/code&gt; does not block reads or writes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MySQL:&lt;/strong&gt; Standard &lt;code&gt;OPTIMIZE TABLE&lt;/code&gt; already runs as online DDL on MySQL 8.0+ (introduced in the I/O Degradation section). Reach for &lt;code&gt;pt-online-schema-change&lt;/code&gt; when you need finer control over lock duration on very large tables, or when you want triggered shadow-copy semantics that &lt;code&gt;OPTIMIZE TABLE&lt;/code&gt; does not offer.&lt;/p&gt;

&lt;p&gt;The four symptom categories above all produce observable performance signals before they become outages. Corruption is different: it produces no signal until it surfaces as query failures or data loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Symptom: Silent Corruption and Integrity Failures
&lt;/h2&gt;

&lt;p&gt;Because corruption produces no precursor wait events or latency drift, detection is a deliberate scheduled act, not an alert response. Regular integrity checks are the primary detection mechanism, supplemented by storage-level checksums, page verification, and reliable backups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Server:&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-checkdb-transact-sql?view=sql-server-ver17" rel="noopener noreferrer"&gt;&lt;code&gt;DBCC CHECKDB&lt;/code&gt;&lt;/a&gt; catches &lt;a href="https://techcommunity.microsoft.com/blog/sqlserversupport/sql-server-database-corruption-causes-detection-and-some-details-behind-dbcc-che/4460631" rel="noopener noreferrer"&gt;page corruption, allocation errors, and consistency violations&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Recommended production form: suppresses informational messages, shows only errors&lt;/span&gt;
&lt;span class="n"&gt;DBCC&lt;/span&gt; &lt;span class="n"&gt;CHECKDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ProductionDB'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;NO_INFOMSGS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ALL_ERRORMSGS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For large databases where a full DBCC CHECKDB is too slow for a maintenance window, &lt;code&gt;DBCC CHECKDB ... WITH PHYSICAL_ONLY&lt;/code&gt; checks page and record header integrity without logical consistency checks and completes significantly faster. Corruption surfaces in the SQL Server error log as &lt;a href="https://support.microsoft.com/en-us/help/2015755/how-to-troubleshoot-a-msg-823-error-in-sql-server" rel="noopener noreferrer"&gt;messages Msg 823, 824, or 825&lt;/a&gt;. To proactively check for known corruption events, query the suspect pages table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;db_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_update_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;msdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suspect_pages&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Event_type 1 = 823/824 errors, 2 = bad checksum, 3 = torn page. A non-empty result requires immediate DBCC CHECKDB and restore planning.&lt;/p&gt;

&lt;p&gt;Running DBCC CHECKDB as frequently as your maintenance windows allow is the safe path. Many experts recommend daily on all databases; if that is impractical, prioritize critical databases and shorten the interval on large ones using &lt;code&gt;WITH PHYSICAL_ONLY&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL:&lt;/strong&gt; The &lt;a href="https://www.postgresql.org/docs/14/app-pgamcheck.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_amcheck&lt;/code&gt;&lt;/a&gt; utility (PostgreSQL 14+) verifies B-tree index integrity by checking that every heap tuple referenced by an index entry actually exists and that index entries are in the correct sort order. The default invocation is fast enough for routine scheduled checks and catches most corruption:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_amcheck mydb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After an unexpected crash, storage event, or replication failure, run the thorough variant on critical tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_amcheck &lt;span class="nt"&gt;--heapallindexed&lt;/span&gt; &lt;span class="nt"&gt;--parent-check&lt;/span&gt; mydb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--heapallindexed&lt;/code&gt; performs a deeper check that every heap tuple has a corresponding index entry; &lt;code&gt;--parent-check&lt;/code&gt; verifies cross-level B-tree invariants. Both flags increase runtime substantially, so reserve them for incident response or post-event verification rather than the routine schedule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MySQL:&lt;/strong&gt; &lt;code&gt;mysqlcheck&lt;/code&gt; provides table-level integrity verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mysqlcheck &lt;span class="nt"&gt;--check&lt;/span&gt; &lt;span class="nt"&gt;--all-databases&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For individual tables, &lt;code&gt;CHECK TABLE table_name&lt;/code&gt; within the MySQL client performs the same operation. InnoDB tables benefit from &lt;a href="https://dev.mysql.com/doc/refman/9.7/en/check-table.html" rel="noopener noreferrer"&gt;&lt;code&gt;CHECK TABLE ... FOR UPGRADE&lt;/code&gt;&lt;/a&gt; after major version upgrades to verify storage format compatibility.&lt;/p&gt;

&lt;p&gt;Running these checks manually is the safety net. The next section shows how to automate the response so the platform acts before the on-call engineer logs in.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Alert to Fix: Automated Remediation Across Engines
&lt;/h2&gt;

&lt;p&gt;When the alert fires at 3 AM, having the platform execute the remediation automatically matters far more than knowing the fix. OpManager Nexus's IT Workflow Automation triggers a custom monitoring script when an alert threshold is breached: the script queries the symptom's diagnostic surface (fragmentation, dead tuples, log space), evaluates severity, and runs the remediation.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL Server: wiring remediation into OpManager Nexus
&lt;/h3&gt;

&lt;p&gt;OpManager Nexus accepts PowerShell or shell scripts as &lt;a href="https://www.manageengine.com/network-monitoring/script-monitoring.html" rel="noopener noreferrer"&gt;custom monitors&lt;/a&gt; (Custom Script Monitors require build 12.7 or later). The integration pattern matches the PostgreSQL and MySQL examples below: query &lt;code&gt;sys.dm_db_index_physical_stats&lt;/code&gt; for fragmentation, branch on the threshold, issue &lt;code&gt;ALTER INDEX REORGANIZE&lt;/code&gt; or &lt;code&gt;REBUILD WITH (ONLINE = ON)&lt;/code&gt; accordingly, and emit one log line per action so the run shows up in the monitor's history. Run the script under a service account with at least &lt;code&gt;db_ddladmin&lt;/code&gt; on the target database; for SQL authentication or cross-domain setups, pull credentials from a secrets store rather than embedding them.&lt;/p&gt;

&lt;h3&gt;
  
  
  PostgreSQL and MySQL shell automation
&lt;/h3&gt;

&lt;p&gt;For PostgreSQL, a cron-driven shell script can query &lt;code&gt;pg_stat_user_tables&lt;/code&gt; for bloated tables and trigger remediation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# PostgreSQL automated vacuum/reindex for tables exceeding dead tuple threshold.&lt;/span&gt;
&lt;span class="c"&gt;# Credentials sourced from ~/.pgpass (chmod 600); export PGPASSFILE if non-default.&lt;/span&gt;
&lt;span class="nv"&gt;PGHOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"localhost"&lt;/span&gt;
&lt;span class="nv"&gt;PGPORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"5432"&lt;/span&gt;
&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"app_prod"&lt;/span&gt;
&lt;span class="nv"&gt;PGUSER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"maintenance_user"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGPASSFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PGPASSFILE&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="p"&gt;/.pgpass&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nv"&gt;DEAD_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;15
&lt;span class="nv"&gt;BLOAT_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30

&lt;span class="c"&gt;# VACUUM tables with high dead tuple ratio&lt;/span&gt;
psql &lt;span class="nt"&gt;-h&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGHOST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGPORT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGUSER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;'|'&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
  SELECT schemaname, relname, round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2)
  FROM pg_stat_user_tables
  WHERE n_live_tup &amp;gt; 10000
    AND round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) &amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;$DEAD_THRESHOLD&lt;/span&gt;&lt;span class="s2"&gt;
"&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'|'&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; schema table dead_pct&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%d %H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; | VACUUM ANALYZE &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;schema&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;table&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | dead_pct=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;dead_pct&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%"&lt;/span&gt;
  psql &lt;span class="nt"&gt;-h&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGHOST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGPORT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGUSER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"VACUUM ANALYZE &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;schema&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;table&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For MySQL, a similar approach queries &lt;code&gt;information_schema.TABLES&lt;/code&gt; and triggers &lt;code&gt;OPTIMIZE TABLE&lt;/code&gt;. Use a MySQL option file instead of embedding credentials in the script (create &lt;code&gt;~/.my.cnf&lt;/code&gt; with &lt;code&gt;[client]&lt;/code&gt; credentials and restrict permissions to 600):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# MySQL automated optimize for InnoDB tables exceeding fragmentation threshold&lt;/span&gt;
&lt;span class="nv"&gt;MYSQL_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"localhost"&lt;/span&gt;
&lt;span class="nv"&gt;MYSQL_DB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"app_prod"&lt;/span&gt;

&lt;span class="nv"&gt;FRAG_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;20

mysql &lt;span class="nt"&gt;--defaults-extra-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.my.cnf"&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MYSQL_HOST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-N&lt;/span&gt; &lt;span class="nt"&gt;-B&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"
  SELECT table_name, round(data_free / (data_length + index_length + data_free) * 100, 2) AS frag_pct
  FROM information_schema.TABLES
  WHERE table_schema = '&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MYSQL_DB&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;'
    AND engine = 'InnoDB'
    AND data_free &amp;gt; 0
    AND round(data_free / (data_length + index_length + data_free) * 100, 2) &amp;gt; &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;FRAG_THRESHOLD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
"&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; table frag_pct&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%d %H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; | OPTIMIZE TABLE &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;table&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | frag_pct=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;frag_pct&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%"&lt;/span&gt;
  mysql &lt;span class="nt"&gt;--defaults-extra-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.my.cnf"&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MYSQL_HOST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MYSQL_DB&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"OPTIMIZE TABLE &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;table&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schedule either script via cron (e.g., &lt;code&gt;0 3 * * * /opt/scripts/pg_maintenance.sh &amp;gt;&amp;gt; /var/log/db_maintenance.log 2&amp;gt;&amp;amp;1&lt;/code&gt;) and monitor the log output through OpManager Nexus's custom monitor integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud-managed database automation
&lt;/h3&gt;

&lt;p&gt;For databases running on Amazon RDS, Aurora, or Azure SQL, OpManager Nexus's SaaS delivery provides the cloud-side counterpart of the PowerShell and shell automation patterns above. Its &lt;a href="https://www.site24x7.com/help/admin/configuration-profiles/actions.html" rel="noopener noreferrer"&gt;IT Automation module&lt;/a&gt; triggers corrective actions from threshold breaches and anomaly detections, and &lt;a href="https://www.site24x7.com/anomaly-detection.html" rel="noopener noreferrer"&gt;AI-powered baselines&lt;/a&gt; replace the manual threshold tuning that self-managed instances require. For RDS specifically, &lt;a href="https://www.site24x7.com/help/it-automation/rds-actions.html" rel="noopener noreferrer"&gt;service actions&lt;/a&gt; like start, stop, and reboot with failover are surfaced directly. &lt;a href="https://www.site24x7.com/help/database-monitoring/" rel="noopener noreferrer"&gt;Engine-specific monitor setup&lt;/a&gt; for SQL Server, PostgreSQL, and MySQL is documented separately. Threshold profiles let you apply equivalent alert configurations across dev, staging, and production monitors, so a query that fragments an index under realistic staging load surfaces in slow query detection before it reaches production scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maintenance Health Scorecard: Assessing Your Current Posture
&lt;/h2&gt;

&lt;p&gt;Instead of running through the diagnostic queries from scratch, use this scorecard to assess your maintenance posture. Each item references the diagnostic approach covered in its corresponding section above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I/O health (see: I/O Degradation section)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] SQL Server: Run the &lt;code&gt;sys.dm_db_index_physical_stats&lt;/code&gt; query (filter the results at 30% fragmentation). Count of indexes returned: ___&lt;/li&gt;
&lt;li&gt;[ ] PostgreSQL: Run the &lt;code&gt;pg_stat_user_tables&lt;/code&gt; dead tuple query. Tables with dead_pct above 10-20% are candidates for immediate attention: ___&lt;/li&gt;
&lt;li&gt;[ ] MySQL: Run the &lt;code&gt;information_schema.TABLES&lt;/code&gt; fragmentation query. Tables with frag_pct above 20%: ___&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Statistics freshness (see: Query Plan Regression section)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] SQL Server: Check &lt;code&gt;sys.dm_db_index_usage_stats&lt;/code&gt; for indexes with zero seeks but high scans (plan regression or poorly matched index)&lt;/li&gt;
&lt;li&gt;[ ] PostgreSQL: Verify &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; is set below 0.1 for tables above 100 million rows&lt;/li&gt;
&lt;li&gt;[ ] MySQL: Run &lt;code&gt;ANALYZE TABLE&lt;/code&gt; on your top 10 tables by write volume; capture &lt;code&gt;EXPLAIN&lt;/code&gt; output for representative queries before and after to confirm planner statistics changed as expected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Storage trajectory (see: Storage Pressure section)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] OpManager Nexus forecast report confirms sufficient capacity runway before any threshold crossing: Yes / No&lt;/li&gt;
&lt;li&gt;[ ] Transaction log backup job (SQL Server) or WAL archiving (PostgreSQL) is confirmed running and last backup verified: Yes / No&lt;/li&gt;
&lt;li&gt;[ ] Binary log rotation (MySQL) is configured with &lt;code&gt;binlog_expire_logs_seconds&lt;/code&gt; set to an explicit value: Yes / No&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Integrity baseline (see: Silent Corruption section)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] SQL Server: &lt;code&gt;DBCC CHECKDB&lt;/code&gt; last run date on critical databases: ___&lt;/li&gt;
&lt;li&gt;[ ] PostgreSQL: &lt;code&gt;pg_amcheck&lt;/code&gt; last run date (or equivalent manual check): ___&lt;/li&gt;
&lt;li&gt;[ ] MySQL: &lt;code&gt;mysqlcheck --check&lt;/code&gt; last run date: ___&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Automation coverage (see: Automated Remediation section)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] At least one automated remediation script is deployed, scheduled, and confirmed to be producing output logs: Yes / No&lt;/li&gt;
&lt;li&gt;[ ] OpManager Nexus alert thresholds are configured and tested for key database health metrics (BCHR, disk utilization, blocked sessions): Yes / No&lt;/li&gt;
&lt;li&gt;[ ] Maintenance windows are scheduled based on monitoring signals, not calendar dates: Yes / No&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cross-reference those results against OpManager Nexus's slow query and session data (on-prem Performance Tab or SaaS Database Metrics dashboard). If a table in the top results by size also appears as a source of slow query detections, that is your highest-priority maintenance target.&lt;/p&gt;

</description>
      <category>database</category>
      <category>devops</category>
      <category>performance</category>
      <category>sql</category>
    </item>
    <item>
      <title>vLLM in Production: Ranked Configuration Decisions, Failure Modes, and the Architecture That Makes Them Work</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Wed, 20 May 2026 11:37:06 +0000</pubDate>
      <link>https://dev.to/damasosanoja/vllm-in-production-ranked-configuration-decisions-failure-modes-and-the-architecture-that-makes-2g7p</link>
      <guid>https://dev.to/damasosanoja/vllm-in-production-ranked-configuration-decisions-failure-modes-and-the-architecture-that-makes-2g7p</guid>
      <description>&lt;p&gt;Production &lt;a href="https://github.com/vllm-project/vllm" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; deployments live or die on three configuration decisions, and getting any of them wrong shows up early: &lt;a href="https://docs.vllm.ai/en/latest/configuration/conserving_memory/" rel="noopener noreferrer"&gt;static KV cache allocation&lt;/a&gt; will OOM your GPU long before billing teaches you the same lesson. This guide is written for the operator who already accepts vLLM as the default serving engine and now needs a ranked decision surface, a runbook for the failure modes, and a clean view of the architecture that makes the knobs behave the way they do.&lt;/p&gt;

&lt;p&gt;Configuration guidance and architecture descriptions in this article reflect &lt;a href="https://github.com/vllm-project/vllm/releases" rel="noopener noreferrer"&gt;vLLM 0.20.x and the V1 engine&lt;/a&gt;, which has been the default since v0.8.0 (released March 2025). Flag behavior and metric names may differ on releases before v0.8.0, when V1 was opt-in via &lt;code&gt;VLLM_USE_V1=1&lt;/code&gt;. All commands assume vLLM installed via &lt;code&gt;pip install vllm&lt;/code&gt; (tested on Python 3.10+ / CUDA 12.x). For containerized deployments, the official image is &lt;code&gt;vllm/vllm-openai&lt;/code&gt;. Check the &lt;a href="https://docs.vllm.ai/en/latest/getting_started/installation.html" rel="noopener noreferrer"&gt;installation guide&lt;/a&gt; for version-specific CUDA requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Cost-per-token: the three decisions that dominate vLLM deployments&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;At scale with a real inter-token latency SLA, &lt;a href="https://docs.vllm.ai/en/stable/configuration/optimization/" rel="noopener noreferrer"&gt;vLLM cost&lt;/a&gt; is shaped by configuration choices long before GPU budget enters the conversation. Land the three below, and the remaining tuning surface yields diminishing returns; miss any of them, and no amount of GPU spend will rescue the SLA.&lt;/p&gt;

&lt;p&gt;The first decision is &lt;strong&gt;framework choice itself&lt;/strong&gt;. vLLM is the right default for most teams, but &lt;a href="https://github.com/NVIDIA/TensorRT-LLM" rel="noopener noreferrer"&gt;TensorRT-LLM&lt;/a&gt;, &lt;a href="https://github.com/sgl-project/sglang" rel="noopener noreferrer"&gt;SGLang&lt;/a&gt;, and &lt;a href="https://github.com/huggingface/text-generation-inference" rel="noopener noreferrer"&gt;TGI&lt;/a&gt; each win in narrow conditions. Committing to vLLM under the wrong workload (deeply branching agentic call graphs, fixed-shape NVIDIA-only deployments at extreme scale) is a slower-to-fix mistake than a flag value.&lt;/p&gt;

&lt;p&gt;The second is the &lt;strong&gt;memory budget&lt;/strong&gt;: how much VRAM you cede to KV cache versus weights and activations, expressed through &lt;a href="https://docs.vllm.ai/en/stable/configuration/engine_args/" rel="noopener noreferrer"&gt;&lt;code&gt;--gpu-memory-utilization&lt;/code&gt; and &lt;code&gt;--max-model-len&lt;/code&gt;&lt;/a&gt;. This is the variable that determines how many concurrent sequences your pool can hold before the scheduler starts &lt;a href="https://arxiv.org/pdf/2309.06180" rel="noopener noreferrer"&gt;preempting&lt;/a&gt;. It is also the variable that operators most often leave at defaults on shared infrastructure and then debug for a week.&lt;/p&gt;

&lt;p&gt;The third is the &lt;strong&gt;batching and admission strategy&lt;/strong&gt;: continuous batching is on by default, but &lt;a href="https://docs.vllm.ai/en/v0.4.2/models/performance.html" rel="noopener noreferrer"&gt;&lt;code&gt;--enable-chunked-prefill&lt;/code&gt; and &lt;code&gt;--enable-prefix-caching&lt;/code&gt;&lt;/a&gt; decide whether prefill work corrupts your decode latency and whether repeated prompt prefixes are paid for once or every time. Two flags, both cheap to enable, both with workload-dependent payoffs.&lt;/p&gt;

&lt;p&gt;The rest of this guide treats these three in order: framework choice first, then the architecture that makes the budget and batching knobs predictable, followed by deployment shapes, memory budgeting, the measurement contract that validates your configuration, the ranked knobs themselves, and finally the failure modes you will see when one of them is off.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Serving framework: vLLM, SGLang, TensorRT-LLM, or TGI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The decision is dominated by workload shape and hardware constraint. The flowchart below leads; the prose underneath fills in the cases where the answer is not “vLLM.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frwwc78plw7zrmhx9xx07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frwwc78plw7zrmhx9xx07.png" alt="Serving framework" width="800" height="1552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;When vLLM is not the right default&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/sgl-project/sglang" rel="noopener noreferrer"&gt;SGLang&lt;/a&gt; earns the choice when the workload is &lt;a href="https://arxiv.org/abs/2312.07104" rel="noopener noreferrer"&gt;structured generation or multi-step agent programs&lt;/a&gt;. Its RadixAttention reuses KV state across branching call graphs more aggressively than vLLM’s prefix caching, which matters when a single user turn fans out into a tree of constrained-output sub-calls. For linear chat and completion endpoints with unique prompts, that advantage is minimal to negligible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NVIDIA/TensorRT-LLM" rel="noopener noreferrer"&gt;TensorRT-LLM&lt;/a&gt; has a non-trivial throughput advantage on fixed shapes and a fixed NVIDIA SKU, but the cost is operational: every change to model version, GPU tier, or sequence-length configuration forces an engine rebuild measured in tens of minutes for large models. Teams running one model on one hardware tier at a scale where even marginal throughput gains justify operational overhead can get value from TensorRT-LLM. Most teams don’t.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/huggingface/text-generation-inference" rel="noopener noreferrer"&gt;Text Generation Inference (TGI)&lt;/a&gt; overlaps with vLLM on capability and integrates tightly with the Hugging Face ecosystem. The deciding factor is often ecosystem fit: if Hub repos, Spaces, and HF-format configs are already wired into the deployment path, TGI requires less reconfiguration to adopt. Optimization momentum since 2024 has favored vLLM, particularly on the &lt;a href="https://llm-d.ai/blog/kvcache-wins-you-can-see" rel="noopener noreferrer"&gt;scheduling and KV-cache management&lt;/a&gt; side, so greenfield deployments lean vLLM.&lt;/p&gt;

&lt;p&gt;For everything else, including &lt;a href="https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/vllm.html" rel="noopener noreferrer"&gt;AMD GPUs&lt;/a&gt; and any workload where future GPU portability is a constraint, vLLM is the answer. Before sizing the deployment, understanding the architectural primitives that make vLLM’s configuration surface predictable will make every subsequent decision more legible.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;vLLM architecture: PagedAttention, continuous batching, and V1 modularity&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The configuration surface above is only as good as the runtime behavior that backs it. Three architectural pieces give the budget knob, the batching flags, and the scheduler-tuning options their teeth. The framing here is “why does that knob work?” rather than “here is the breakthrough.”&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;PagedAttention as virtual memory for KV&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://blog.vllm.ai/2023/06/20/vllm.html" rel="noopener noreferrer"&gt;PagedAttention&lt;/a&gt; treats the KV cache the way an operating system treats process memory: as fixed-size physical blocks (16 tokens per block by default) accessed through a per-sequence logical-to-physical block table. Physical blocks live anywhere in GPU memory and don’t need to be contiguous. When a sequence advances, the allocator hands it one more block at a time. When the sequence terminates, every block returns to the free pool immediately. Block sharing across sequences with identical prefix tokens is the foundation that makes &lt;a href="https://docs.vllm.ai/en/stable/design/prefix_caching/" rel="noopener noreferrer"&gt;prefix caching&lt;/a&gt; possible.&lt;/p&gt;

&lt;p&gt;The flowchart below shows how the operator-set memory budget translates into runtime behavior, starting from the configuration value rather than from request arrival.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foikz0ocryeo48igvj7o7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foikz0ocryeo48igvj7o7.png" alt="vLLM architecture" width="766" height="1754"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The block-pool sizing step at the top is what makes &lt;code&gt;--gpu-memory-utilization&lt;/code&gt; an operator-level budget. The reclaim path at the bottom is what makes eviction an observable event rather than a silent failure: the &lt;a href="https://docs.vllm.ai/en/stable/design/metrics/" rel="noopener noreferrer"&gt;metrics endpoint&lt;/a&gt; reports free-block count and the scheduler logs reclaim actions, which is why the failure-mode catalog can name eviction as a diagnosable signature.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Continuous batching at the iteration level&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The other half of the throughput story is iteration-level scheduling. Static batching waits for a full batch of N sequences, runs the forward pass, returns all outputs, then admits the next batch; any sequence finishing early leaves its slot idle until the batch completes. The vLLM scheduler &lt;a href="https://www.anyscale.com/blog/continuous-batching-llm-inference" rel="noopener noreferrer"&gt;operates at the iteration level&lt;/a&gt;: when a sequence completes, its slot is freed and a waiting request can be admitted at the next iteration. The result is higher GPU utilization at steady state and lower average queue time, both of which the ranked-knobs section relies on when it claims that prefix caching and chunked prefill change the ITL distribution rather than just the mean.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;V1 modularity&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://blog.vllm.ai/2025/01/27/v1-alpha-release.html" rel="noopener noreferrer"&gt;vLLM V1 re-architecture&lt;/a&gt; splits the scheduler, KV cache manager, and model runner into distinct, modular components. For operators, the practical change is a cleaner configuration surface; the modular design also provides developer-level hackability for custom scheduler and cache manager implementations. The disaggregated-serving direction in the closing section rests on this modular substrate.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Deployment surfaces: single-GPU, tensor-parallel, serverless&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Three deployment shapes cover production vLLM workloads. The VRAM sizing rule is the same in all three: budget weights as 2 bytes per parameter at BF16/FP16, 1 byte at INT8, and 0.5 bytes at INT4, then subtract weights from &lt;code&gt;--gpu-memory-utilization x VRAM&lt;/code&gt; to get the KV pool budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Single-GPU&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The minimal configuration on an L40S 48GB is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve mistralai/Mistral-7B-Instruct-v0.3 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-gpu-memory-utilization&lt;/span&gt; 0.90 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-max-model-len&lt;/span&gt; 16384
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3" rel="noopener noreferrer"&gt;Mistral-7B-Instruct-v0.3&lt;/a&gt; at BF16 occupies roughly 14 GB for weights. At 0.90 utilization on a 48 GB L40S the engine has a 43.2 GB envelope, which leaves roughly 29 GB for the KV pool. Capping &lt;code&gt;--max-model-len&lt;/code&gt; at 16K rather than the model’s 32K maximum halves the worst-case per-sequence KV claim and &lt;a href="https://docs.anyscale.com/llm/serving/parameter-tuning" rel="noopener noreferrer"&gt;roughly doubles the concurrency&lt;/a&gt; the same pool can support; in production chat traffic the truncation is invisible. On an A100 40GB the same model leaves about 22 GB for KV; on an A100 80GB, about 58 GB. The numerical method is identical, only the GPU envelope changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Tensor-parallel for larger models&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A 70B-class model in BF16 will not fit on a single GPU. &lt;a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct" rel="noopener noreferrer"&gt;Qwen2.5-72B-Instruct&lt;/a&gt; at BF16 occupies roughly 144 GB of weights, which requires at minimum two 80 GB GPUs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve Qwen/Qwen2.5-72B-Instruct &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-tensor-parallel-size&lt;/span&gt; 2 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-gpu-memory-utilization&lt;/span&gt; 0.90 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-max-model-len&lt;/span&gt; 32768
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cap &lt;code&gt;--max-model-len&lt;/code&gt; to your actual use case; the Qwen2.5-72B architectural maximum is 128K, and leaving it at the default with only two 80 GB GPUs will exhaust the KV pool at moderate concurrency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.vllm.ai/en/stable/serving/parallelism_scaling/" rel="noopener noreferrer"&gt;Tensor parallelism&lt;/a&gt; shards the attention and feed-forward weight matrices across the configured number of devices and exchanges activation tensors at each layer boundary. The interconnect topology matters. &lt;a href="https://www.nvidia.com/en-us/data-center/nvlink/" rel="noopener noreferrer"&gt;NVLink&lt;/a&gt; carries that traffic at bandwidths that keep the per-layer cost in the noise; PCIe is functional but adds measurable overhead per forward pass, with workload-dependent throughput losses that can reach the mid-double-digit-percent range in adverse topologies. If the host machine has the model split across GPUs that aren’t NVLink-bridged, expect to see that overhead reflected in the throughput numbers, not just the topology diagram.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Serverless via Runpod&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For teams that need a vLLM endpoint on H100, A100, or L40S without operating GPU infrastructure, &lt;a href="https://www.runpod.io/serverless-gpu" rel="noopener noreferrer"&gt;Runpod’s Serverless&lt;/a&gt; provisions one in minutes (initial model download may extend total setup time). The console walkthrough, endpoint creation, vLLM worker selection, model ID, &lt;code&gt;MAX_MODEL_LEN&lt;/code&gt; / &lt;code&gt;GPU_MEMORY_UTILIZATION&lt;/code&gt; / &lt;code&gt;DTYPE&lt;/code&gt; env vars, and &lt;code&gt;HF_TOKEN&lt;/code&gt; for gated checkpoints like Llama-3 or Gemma is covered end-to-end in the &lt;a href="https://docs.runpod.io/serverless/quickstart" rel="noopener noreferrer"&gt;Serverless quickstart&lt;/a&gt;; the configuration surface that matters for production is what comes next.&lt;/p&gt;

&lt;p&gt;Runpod maps every &lt;code&gt;AsyncEngineArgs&lt;/code&gt; field to an &lt;a href="https://docs.runpod.io/serverless/workers/handler-functions#asynchronous-handlers" rel="noopener noreferrer"&gt;uppercase environment variable&lt;/a&gt; of the same name, so any launch-script flag has a configuration-panel equivalent that is editable without redeploying. The endpoint exposes an &lt;a href="https://docs.runpod.io/serverless/vllm/openai-compatibility" rel="noopener noreferrer"&gt;OpenAI-compatible API&lt;/a&gt; at &lt;code&gt;https://api.runpod.ai/v2/&amp;lt;ENDPOINT_ID&amp;gt;/openai/v1&lt;/code&gt;, which the OpenAI SDK consumes without code changes:&lt;/p&gt;

&lt;p&gt;from openai import OpenAI&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;client &lt;span class="se"&gt;\=&lt;/span&gt; OpenAI&lt;span class="o"&gt;(&lt;/span&gt;  
    api&lt;span class="se"&gt;\_&lt;/span&gt;&lt;span class="nv"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-runpod-api-key"&lt;/span&gt;,  
    base&lt;span class="se"&gt;\_&lt;/span&gt;&lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s2"&gt;https://api.runpod.ai/v2/&lt;/span&gt;&lt;span class="se"&gt;\&amp;gt;\&amp;lt;&lt;/span&gt;&lt;span class="s2"&gt;ENDPOINT&lt;/span&gt;&lt;span class="se"&gt;\_&lt;/span&gt;&lt;span class="s2"&gt;ID&lt;/span&gt;&lt;span class="se"&gt;\&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;/openai/v1"&lt;/span&gt;,  
&lt;span class="o"&gt;)&lt;/span&gt;

completion &lt;span class="se"&gt;\=&lt;/span&gt; client.chat.completions.create&lt;span class="o"&gt;(&lt;/span&gt;  
    &lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mistralai/Mistral-7B-Instruct-v0.3"&lt;/span&gt;,  
    &lt;span class="nv"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="se"&gt;\[&lt;/span&gt;  
        &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"role"&lt;/span&gt;: &lt;span class="s2"&gt;"user"&lt;/span&gt;, &lt;span class="s2"&gt;"content"&lt;/span&gt;: &lt;span class="s2"&gt;"Summarize the trade-offs of FP8 KV cache quantization."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;  
    &lt;span class="se"&gt;\]&lt;/span&gt;,  
    max&lt;span class="se"&gt;\_&lt;/span&gt;&lt;span class="nv"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;512,  
&lt;span class="o"&gt;)&lt;/span&gt;

print&lt;span class="o"&gt;(&lt;/span&gt;completion.choices&lt;span class="se"&gt;\[&lt;/span&gt;0&lt;span class="se"&gt;\]&lt;/span&gt;.message.content&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Billing is per-second of active compute, which makes serverless a useful target for ramp testing without committing to reserved capacity. One operational caveat: workers scale to zero between requests, so cold start (the interval from first request to first token on a freshly-initialized worker) ranges from roughly 30 seconds on cached images to 90+ seconds on first pull, before any inference latency. Run a warm-up request before recording p99 metrics.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Memory budgeting: multi-tenant discipline on shared GPUs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;GPU memory on shared infrastructure is best treated as a tenancy budget rather than a single number to dial. &lt;code&gt;--gpu-memory-utilization&lt;/code&gt; is the primitive that exposes the budget to vLLM, and the right value depends on what else lives on the device.&lt;/p&gt;

&lt;p&gt;On a shared node, every co-tenant (a monitoring agent, a sidecar model, a CUDA debugger) competes for the same headroom, and a peak utilization that worked in isolation can OOM in production. The discipline is to allocate a per-tenant headroom share before deciding the utilization value, then verify with &lt;code&gt;watch -n1 nvidia-smi --query-gpu=memory.used,memory.free,utilization.gpu --format=csv&lt;/code&gt; during a ramp test. Confirm that &lt;code&gt;memory.used&lt;/code&gt; at peak load stays within the tenant’s allocated share and that &lt;code&gt;memory.free&lt;/code&gt; never drops below the headroom you reserved for CUDA context and activation buffers. This headroom discipline is &lt;a href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/enterprise-aks-multi-instance-gpu-mig-vllm-deployment-guide/4450296" rel="noopener noreferrer"&gt;operator practice, not a vLLM feature&lt;/a&gt;; the framework gives you a budget knob and trusts you to know what fraction of the device is yours.&lt;/p&gt;

&lt;p&gt;Treating the budget as configuration that you version alongside the model and tenant changes is the practice that prevents the next incident. Platforms that surface it as a first-class endpoint setting (&lt;a href="https://docs.runpod.io/serverless/vllm/get-started" rel="noopener noreferrer"&gt;Runpod’s &lt;code&gt;GPU_MEMORY_UTILIZATION&lt;/code&gt; env var&lt;/a&gt; is one example) make the discipline easier; on a hand-rolled launch script the same value belongs in a checked-in config file, not in the bash history.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Measurement contract: TTFT, ITL, and ramp testing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Production vLLM deployments are bounded by a measurement contract the operator owes the SLA. Four quantities define the contract, and the protocol that verifies them is a ramp test against the actual model on representative traffic. Definitions and methodology belong together; separating them is what produces the dashboards that look healthy until production breaks them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TTFT (Time To First Token)&lt;/strong&gt; is the wall-clock interval from request arrival to the first token streamed back. It is dominated by &lt;a href="https://www.ibm.com/think/topics/time-to-first-token" rel="noopener noreferrer"&gt;prefill&lt;/a&gt;: the cost of pushing the entire input through every attention layer once. Sub-second TTFT is the correct target for interactive chat; multi-second TTFT is acceptable for batch summarization where no human is watching the cursor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ITL (Inter-Token Latency)&lt;/strong&gt; is the gap between successive output tokens during decode. &lt;strong&gt;TPOT&lt;/strong&gt; (&lt;a href="https://docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html" rel="noopener noreferrer"&gt;Time Per Output Token&lt;/a&gt;) is the mean of that distribution across the full output. Interactive UX tracks ITL consistency far more than mean TPOT, because users perceive cadence stalls more readily than variations in average rate, and a mean of 50 ms with a clean p99 reads smoother than a faster mean with a long tail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;End-to-end latency&lt;/strong&gt; is TTFT plus the sum of all ITLs across the response. SLAs typically cite this number, but it lags as a diagnostic: a healthy deployment shows p99 ITL within a small multiple of the median, and when that multiple stretches you are seeing the symptoms catalogued later in this guide (KV eviction, prefill-decode contention, communication stalls) before they show up in end-to-end numbers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Reading the benchmark output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;vLLM ships a benchmark harness in its source tree that measures all four quantities against a running server. If you installed via pip, clone the repo first: &lt;code&gt;git clone &amp;lt;https://github.com/vllm-project/vllm&amp;gt; &amp;amp;&amp;amp; cd vllm&lt;/code&gt;. Start the server in a separate terminal (&lt;code&gt;vllm serve &amp;lt;model&amp;gt; ...&lt;/code&gt;), then run the benchmark. The &lt;code&gt;--dataset-name sharegpt&lt;/code&gt; flag downloads the ShareGPT dataset on first use; substitute &lt;code&gt;--dataset-name random&lt;/code&gt; for air-gapped environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;benchmarks&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;benchmark&lt;/span&gt;\&lt;span class="n"&gt;_serving&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; \\\\  
    \&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="n"&gt;mistralai&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Mistral&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Instruct&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; \\\\  
    \&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; \\\\  
    \&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;prompts&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt; \\\\  
    \&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;sharegpt&lt;/span&gt; \\\\  
    \&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="n"&gt;localhost&lt;/span&gt; \\\\  
    \&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output reports mean, median, and p99 for TTFT, ITL, and TPOT, plus aggregate throughput in tokens per second. Read the p99 columns first. Mean values smooth over the eviction events and contention spikes that actually shape the user experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Ramp methodology&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A single-rate benchmark tells you whether one operating point is healthy. Finding the serving ceiling requires ramping. Step &lt;code&gt;--request-rate&lt;/code&gt; upward (1, 2, 4, 8, 16, …) and record p99 ITL at each step. The point where p99 ITL begins growing super-linearly with request rate is the ceiling for the current configuration. Beyond that point the deployment is capacity-constrained, most commonly due to KV pool pressure, scheduler oversubscription, or a combination of both. The configuration changes in the next section move that ceiling; the ramp test is what proves they did.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Configuration knobs: four flags ranked by impact&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once the deployment surface is fixed, four flags do most of the work on a standard mixed-traffic deployment. Treat the order below as the baseline impact ranking; the “when this matters” line on each one is what you check before deciding to enable it. Two additional features for specific workload classes follow in the next section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quantization (&lt;code&gt;--quantization awq&lt;/code&gt;).&lt;/strong&gt; Largest single memory win available. AWQ and GPTQ cut weight footprint by half (INT8) or 75% (INT4) relative to BF16, with quality degradation that is model- and benchmark-dependent but usually small for instruction-tuned models on standard tasks. &lt;a href="https://arxiv.org/abs/2306.00978" rel="noopener noreferrer"&gt;AWQ (Activation-aware Weight Quantization)&lt;/a&gt; calibrates against activation distributions rather than applying static rounding, which generally produces better outputs at the same bit width.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve mistralai/Mistral-7B-Instruct-v0.3 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-quantization&lt;/span&gt; awq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--quantization awq&lt;/code&gt; flag expects the model checkpoint to already be in AWQ format. Pointing it at a standard BF16 checkpoint will produce a runtime error, not a silent quality degradation. Search the Hub for a &lt;code&gt;*-AWQ&lt;/code&gt; variant of your model, or run a post-hoc quantization pass with &lt;a href="https://github.com/casper-hansen/AutoAWQ" rel="noopener noreferrer"&gt;AutoAWQ&lt;/a&gt; before serving.&lt;/p&gt;

&lt;p&gt;When this matters: any deployment where weights are crowding the KV pool or where you want headroom for higher concurrency without moving to a larger GPU. Verify the chosen model has an AWQ checkpoint on the &lt;a href="https://huggingface.co/models?search=awq" rel="noopener noreferrer"&gt;Hub&lt;/a&gt;; if not, GPTQ is the post-hoc alternative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FP8 KV cache (&lt;code&gt;--kv-cache-dtype fp8&lt;/code&gt;).&lt;/strong&gt; Storing KV in FP8 instead of BF16 &lt;a href="https://vllm.ai/blog/fp8-kvcache" rel="noopener noreferrer"&gt;halves cache memory&lt;/a&gt;; at 64K context the KV-cache footprint that previously consumed roughly 8 GB drops to about 4 GB on the running model. Quality degradation is &lt;a href="https://docs.vllm.ai/en/latest/features/quantization/quantized_kvcache/" rel="noopener noreferrer"&gt;measurable but small&lt;/a&gt; on standard benchmarks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve mistralai/Mistral-7B-Instruct-v0.3 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-kv-cache-dtype&lt;/span&gt; fp8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FP8 KV cache is natively accelerated on H100 (Hopper) GPUs. On A100 and L40S (Ampere/Ada), vLLM falls back to software emulation which still saves memory but at reduced throughput gains. Verify the behavior on your GPU tier before assuming compute neutrality.&lt;/p&gt;

&lt;p&gt;When this matters: long-context workloads where the KV pool, not the weights, is the binding budget. At 4K-8K context the savings are real but rarely change the concurrency story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prefix caching (&lt;code&gt;--enable-prefix-caching&lt;/code&gt;).&lt;/strong&gt; vLLM hashes the token sequence of each KV block and reuses materialized blocks across requests with shared prefixes. A &lt;a href="https://bentoml.com/llm/inference-optimization/prefix-caching" rel="noopener noreferrer"&gt;multi-tenant chat system with a common system prompt&lt;/a&gt; or a RAG pipeline that retrieves from a small corpus pays prefill once for the shared portion instead of every request. The fraction of prefill compute eliminated is workload-dependent and tracks the prefix-overlap rate of your traffic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve mistralai/Mistral-7B-Instruct-v0.3 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-enable-prefix-caching&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this matters: any workload with non-trivial prompt-prefix overlap, including agentic systems that send the same tool definitions on every call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunked prefill (&lt;code&gt;--enable-chunked-prefill&lt;/code&gt;).&lt;/strong&gt; Splits long prefill phases into smaller chunks and interleaves them with decode steps from in-flight sequences. Without it, a single &lt;a href="https://developers.redhat.com/articles/2026/03/09/5-steps-triage-vllm-performance" rel="noopener noreferrer"&gt;10K-token prefill stalls decode&lt;/a&gt; for every concurrent sequence for the duration, which surfaces as a visible ITL spike. With it, prefill is budgeted across iterations at some TTFT cost on the prefilling request (tunable via max_num_batched_tokens) and steady ITL for everyone else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve mistralai/Mistral-7B-Instruct-v0.3 &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-enable-prefix-caching&lt;/span&gt; &lt;span class="se"&gt;\\\\&lt;/span&gt;  
    &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-enable-chunked-prefill&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this matters: mixed workloads where chat traffic and long-document requests share the same endpoint. The TTFT tradeoff on the prefilling request is small relative to the ITL stability it buys for concurrent sequences.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Speculative decoding and multi-LoRA: throughput levers for specific workloads&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Two 2025-era features change the throughput story for specific workload classes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.vllm.ai/en/latest/features/speculative_decoding/" rel="noopener noreferrer"&gt;&lt;strong&gt;Speculative decoding&lt;/strong&gt;&lt;/a&gt; runs a small draft model in front of the target model to propose tokens that the target then &lt;a href="https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/" rel="noopener noreferrer"&gt;verifies in parallel&lt;/a&gt;. On workloads where the draft model agrees with the target most of the time (consistent prose, predictable code), the verification step accepts multiple drafted tokens per target step, which raises effective decode throughput without changing output quality. The win shrinks on outputs the &lt;a href="https://bentoml.com/llm/inference-optimization/speculative-decoding" rel="noopener noreferrer"&gt;draft model handles poorly&lt;/a&gt;, so the feature pays back on workload classes more than on benchmarks.&lt;/p&gt;

&lt;p&gt;The relevant flags are &lt;code&gt;--speculative-model &amp;lt;draft-model-id&amp;gt;&lt;/code&gt; and &lt;code&gt;--num-speculative-tokens &amp;lt;N&amp;gt;&lt;/code&gt; (typically 3-5). The draft model must match the tokenizer of the target. VRAM overhead is the full weight footprint of the draft model in addition to the base.&lt;/p&gt;

&lt;p&gt;When to use: latency-sensitive workloads where you can afford the draft-model VRAM and where the target’s outputs are predictable enough for the draft to agree often. Verify current support and operator semantics in the &lt;a href="https://docs.vllm.ai/en/latest/" rel="noopener noreferrer"&gt;vLLM documentation&lt;/a&gt; before committing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.vllm.ai/en/latest/features/lora/" rel="noopener noreferrer"&gt;&lt;strong&gt;Multi-LoRA serving&lt;/strong&gt;&lt;/a&gt; lets a single vLLM instance host the base model once and swap in LoRA adapters per request. For deployments serving many fine-tuned variants of the same base, this collapses the GPU footprint of “one endpoint per adapter” into &lt;a href="https://blog.vllm.ai/2026/02/26/multi-lora.html" rel="noopener noreferrer"&gt;“one endpoint, many adapters.”&lt;/a&gt; The tradeoff is per-request adapter loading latency on cold paths; pre-loading adapters with a dummy warm-up request mitigates this, and you should check the docs for your target vLLM version.&lt;/p&gt;

&lt;p&gt;Enable with &lt;code&gt;--enable-lora&lt;/code&gt;. Register adapters at startup via &lt;code&gt;--lora-modules &amp;lt;name&amp;gt;=&amp;lt;path-or-hub-id&amp;gt;&lt;/code&gt; (repeatable). Control concurrency with &lt;code&gt;--max-loras&lt;/code&gt; and &lt;code&gt;--max-cpu-loras&lt;/code&gt;. Adapters not listed at startup can be loaded dynamically via the &lt;code&gt;/v1/load_lora_adapter&lt;/code&gt; endpoint (vLLM 0.5+).&lt;/p&gt;

&lt;p&gt;When to use: SaaS deployments with per-tenant fine-tunes on a shared base, or any catalog of LoRA variants where one-endpoint-per-adapter is operationally untenable.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Failure modes: KV eviction, prefill-decode contention, OOM&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Three failure modes account for most production vLLM regressions. Each entry pairs an observable symptom with the root cause and the remediation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KV cache eviction.&lt;/strong&gt; &lt;em&gt;Symptom:&lt;/em&gt; p99 ITL spikes to several multiples of the median while mean throughput holds; vLLM logs show “number of free blocks” trending toward zero. &lt;em&gt;Cause:&lt;/em&gt; the &lt;a href="https://blog.vllm.ai/2025/09/05/anatomy-of-vllm.html" rel="noopener noreferrer"&gt;block allocator has run out of free blocks&lt;/a&gt; and is preempting in-flight sequences, which then need to recompute their KV state when re-admitted. &lt;em&gt;Fix:&lt;/em&gt; lower &lt;code&gt;--max-model-len&lt;/code&gt; to the actual maximum your application needs, reduce &lt;code&gt;--gpu-memory-utilization&lt;/code&gt; only if another process on the device is competing for the same memory budget, or move to a larger GPU. Enabling &lt;code&gt;--kv-cache-dtype fp8&lt;/code&gt; reduces the per-token KV cache cost by roughly half (the vLLM blog reports reduction to ~54% of BF16 in best cases) and is often sufficient for long-context workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prefill-decode contention.&lt;/strong&gt; &lt;em&gt;Symptom:&lt;/em&gt; &lt;a href="https://huggingface.co/blog/tngtech/llm-performance-prefill-decode-concurrent-requests" rel="noopener noreferrer"&gt;ITL spikes correlated with the arrival of long-prompt requests&lt;/a&gt; rather than with overall load; mean ITL is fine but the distribution has visible tails after every long prompt. &lt;em&gt;Cause:&lt;/em&gt; prefill is &lt;a href="https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/" rel="noopener noreferrer"&gt;compute-bound on dense matmuls&lt;/a&gt; against long token sequences, decode is memory-bandwidth-bound on matrix-vector products, and a scheduler running both on one GPU has to switch between profiles inside a single iteration. &lt;em&gt;Fix:&lt;/em&gt; &lt;code&gt;--enable-chunked-prefill&lt;/code&gt; budgets prefill across iterations and is the first remediation. If contention persists at high concurrency with mixed prompt lengths, the architectural answer is to split prefill and decode onto &lt;a href="https://docs.vllm.ai/en/latest/features/disagg_prefill/" rel="noopener noreferrer"&gt;different instances&lt;/a&gt;, covered in the closing section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Out-of-memory at admission.&lt;/strong&gt; &lt;em&gt;Symptom:&lt;/em&gt; &lt;a href="https://github.com/vllm-project/vllm/issues/32193" rel="noopener noreferrer"&gt;CUDA OOM during high-concurrency bursts&lt;/a&gt;; the engine refuses new requests rather than running them slowly. &lt;em&gt;Cause:&lt;/em&gt; &lt;a href="https://docs.vllm.ai/projects/vllm-omni/en/stable/configuration/gpu_memory_utilization/" rel="noopener noreferrer"&gt;weights, KV pool, activation memory, and CUDA context&lt;/a&gt; together exceeded the budget set by &lt;code&gt;--gpu-memory-utilization&lt;/code&gt;. The static-allocation case is the classic example: a slot-per-sequence allocator at long &lt;code&gt;max_seq_len&lt;/code&gt; reserves so much KV pool per slot that a fourth or fifth request cannot be admitted even though their working sets would fit. With PagedAttention the equivalent failure is reaching pool exhaustion, which manifests as eviction first; hard OOM can follow when additional memory pressure pushes usage past the allocated budget. &lt;em&gt;Fix:&lt;/em&gt; recompute the budget from first principles (weights bytes + KV pool budget at chosen &lt;code&gt;--max-model-len&lt;/code&gt; + 5-10% headroom) and confirm with a ramp test before declaring the configuration shipped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tensor-parallel communication stalls.&lt;/strong&gt; &lt;em&gt;Symptom:&lt;/em&gt; p99 latency on multi-GPU deployments is disproportionately high relative to single-GPU baselines after accounting for the weight-shard benefit; throughput is sensitive to &lt;code&gt;--tensor-parallel-size&lt;/code&gt; beyond what shard math predicts. &lt;em&gt;Cause:&lt;/em&gt; inter-GPU activation transfers at each layer boundary are constrained by PCIe bandwidth (typically 64 GB/s bidirectional) instead of NVLink (600+ GB/s on H100 NVLink4). &lt;em&gt;Fix:&lt;/em&gt; verify GPU interconnect topology with &lt;code&gt;nvidia-smi topo -m&lt;/code&gt;. If GPUs are PCIe-only, the throughput loss is architectural; mitigation is tensor-parallel-size reduction (to minimize cross-GPU transfers) or migration to NVLink-bridged hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Production observability: vLLM metrics, Prometheus, and alertable thresholds&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Observability for a production vLLM deployment is layered. vLLM exposes a Prometheus-format metrics endpoint at &lt;code&gt;http://&amp;lt;host&amp;gt;:8000/metrics&lt;/code&gt; by default (same port as the OpenAI-compatible API, no additional flag required) that surfaces request and KV-cache state; GPU-level tools sit underneath as the second layer. A minimal Prometheus scrape config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scrape&lt;span class="se"&gt;\_&lt;/span&gt;configs:  
&lt;span class="se"&gt;\-&lt;/span&gt;job&lt;span class="se"&gt;\_&lt;/span&gt;name: vllm  
static&lt;span class="se"&gt;\_&lt;/span&gt;configs:  
&lt;span class="se"&gt;\-&lt;/span&gt;targets:&lt;span class="se"&gt;\[&lt;/span&gt;&lt;span class="s1"&gt;'localhost:8000'&lt;/span&gt;&lt;span class="se"&gt;\]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following metric names are accurate as of vLLM 0.20.x. Verify against &lt;code&gt;/metrics&lt;/code&gt; on your running instance; names have changed between minor versions. Four metrics carry most of the alerting signal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;KV cache utilization (&lt;code&gt;vllm:gpu_cache_usage_perc&lt;/code&gt;).&lt;/strong&gt; Fraction 0-1 representing cache pool consumption. The leading indicator for eviction. Alert when sustained usage exceeds 0.85, well before eviction starts. This metric is the dashboard companion to the eviction failure mode.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pending request queue depth (&lt;code&gt;vllm:num_requests_waiting&lt;/code&gt;).&lt;/strong&gt; The leading indicator for scheduler oversubscription. A &lt;a href="https://github.com/vllm-project/vllm/issues/18826" rel="noopener noreferrer"&gt;queue that grows without bounding&lt;/a&gt; indicates the deployment is past its serving ceiling and ramping admission is what’s needed, not more tuning.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-request TTFT and ITL distributions (&lt;code&gt;vllm:time_to_first_token_seconds&lt;/code&gt;, &lt;code&gt;vllm:time_per_output_token_seconds&lt;/code&gt;).&lt;/strong&gt; The end-user-facing contract. Alert on p99 thresholds tied to the bands defined in the measurement contract, not on means.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU memory utilization and SM activity.&lt;/strong&gt; Underlying-resource view. &lt;code&gt;nvidia-smi&lt;/code&gt;, &lt;code&gt;nvitop&lt;/code&gt;, or &lt;a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/latest/dcgm-exporter.html" rel="noopener noreferrer"&gt;DCGM exporters&lt;/a&gt; fill this layer. Useful when investigating whether contention is on the device or in the scheduler.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Alert thresholds should cite the SLA bands defined in the measurement contract rather than carrying their own copies; one source of truth keeps the dashboard from drifting away from the contract over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Pre-launch checklist: validation steps and the disaggregated-serving roadmap&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before the endpoint takes production traffic, run through this short list:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;-max-model-len&lt;/code&gt; set to the actual maximum context your application uses, not the model’s architectural ceiling (128K is typical for Llama-3.1 and Qwen2.5 class models, which silently inherit it on a default launch).
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-gpu-memory-utilization&lt;/code&gt; reduced from the default of 0.92 if the device is shared, with the per-tenant share documented somewhere your on-call can find it.
&lt;/li&gt;
&lt;li&gt;Ramp test against &lt;code&gt;benchmark_serving.py&lt;/code&gt; on representative traffic, with p99 ITL recorded at each rate up to the target concurrency.
&lt;/li&gt;
&lt;li&gt;Prometheus scrape configured for the vLLM metrics endpoint and alerts wired to the thresholds in the measurement contract.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Disaggregated prefill-decode serving is the architectural answer to the contention failure mode for workloads that have outgrown what &lt;code&gt;--enable-chunked-prefill&lt;/code&gt; can absorb. The direction is toward multi-node deployments that route prefill to compute-optimized instances and decode to memory-bandwidth-optimized instances. Production readiness for any given vLLM version belongs in the docs, not in this guide; check before planning a deployment around disaggregated prefill-decode serving.&lt;/p&gt;

&lt;p&gt;For a validated path from model selection through VRAM sizing and environment-variable configuration, &lt;a href="https://docs.runpod.io/serverless/workers/vllm/get-started" rel="noopener noreferrer"&gt;Runpod’s serverless vLLM documentation&lt;/a&gt; walks through the full setup against the same knobs ranked above.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
    <item>
      <title>Oracle Database Performance Monitoring: A Practitioner's Decision Framework</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Wed, 20 May 2026 10:46:47 +0000</pubDate>
      <link>https://dev.to/damasosanoja/oracle-database-performance-monitoring-a-practitioners-decision-framework-2f5i</link>
      <guid>https://dev.to/damasosanoja/oracle-database-performance-monitoring-a-practitioners-decision-framework-2f5i</guid>
      <description>&lt;p&gt;Oracle exposes a deep diagnostic surface: &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/gathering-database-statistics.html" rel="noopener noreferrer"&gt;AWR snapshots&lt;/a&gt;, &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/analyzing-sampled-data.html" rel="noopener noreferrer"&gt;ASH samples&lt;/a&gt;, wait event histograms, &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/automatic-performance-diagnostics.html" rel="noopener noreferrer"&gt;ADDM recommendations&lt;/a&gt;, alert log entries, and hundreds of &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/refrn/dynamic-performance-v-views-2.html" rel="noopener noreferrer"&gt;V$ dynamic performance views&lt;/a&gt;. Every signal stops at the database boundary, which is where the hardest production cases tend to live.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;Concurrency&lt;/code&gt; wait spike during WebLogic connection pool exhaustion produces the same AWR output as genuine latch contention under steady-state load. A &lt;code&gt;db file sequential read&lt;/code&gt; climbing on index range scans could mean a bad execution plan, or a storage array adding tens of milliseconds of latency because a batch ETL job on a separate system saturated the backend. Each example follows the same pattern: Oracle tells you &lt;em&gt;what&lt;/em&gt; the database is waiting for; infrastructure and application data tells you &lt;em&gt;why&lt;/em&gt;. Closing that gap means putting database events on the same timeline as the surrounding stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.manageengine.com/it-operations-management/database-monitoring.html" rel="noopener noreferrer"&gt;ManageEngine OpManager Nexus&lt;/a&gt; does exactly that, surfacing WebLogic, OCI, storage, and network signals alongside Oracle metrics in a single console.&lt;/p&gt;

&lt;p&gt;This guide is a decision framework for Oracle &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/index.html" rel="noopener noreferrer"&gt;19c&lt;/a&gt; performance monitoring: wait event triage, V$ metric interpretation, tablespace capacity, and alert routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWR report navigation
&lt;/h2&gt;

&lt;p&gt;A typical &lt;a href="https://docs.oracle.com/en-us/iaas/performance-hub/doc/awr-report-ui.html" rel="noopener noreferrer"&gt;AWR report&lt;/a&gt; runs to dozens of sections, but three carry the diagnostic weight for most investigations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Load Profile&lt;/strong&gt; gives you execution rate, transaction rate, and logical/physical read rates &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/measuring-database-performance.html" rel="noopener noreferrer"&gt;normalized per second and per transaction&lt;/a&gt;. Comparing a healthy snapshot against a degraded one with these metrics is the fastest way to tell whether a performance change is driven by workload volume or by execution efficiency degradation on the same volume. Two of its values, &lt;strong&gt;DB Time&lt;/strong&gt; and &lt;strong&gt;DB CPU&lt;/strong&gt;, feed the triage ratio in the next section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Top 5 Timed Events&lt;/strong&gt; ranks the wait events that consumed the most DB Time during the snapshot, in absolute seconds and as a percentage of total DB Time. Map the dominant events to the wait class decision table for routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Ordered by Elapsed Time&lt;/strong&gt; identifies the individual SQL statements responsible for the highest DB Time consumption. Cross-reference with Top 5 Events to separate query-specific bottlenecks from systemic ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/gathering-database-statistics.html" rel="noopener noreferrer"&gt;AWR is historical by design&lt;/a&gt;. During an active incident, pair AWR findings with ASH data to see which sessions are contributing to wait time right now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;ACTIVE_SESSION_HISTORY&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sample_time&lt;/span&gt;   &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'15'&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;session_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'WAITING'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the workflow itself is the bottleneck (generating an HTML report mid-incident, or reading AWR for a cloud-managed Oracle deployment where the provider gates direct access), OpManager Nexus streams these same metrics continuously and exposes equivalent collection through cloud APIs. The manual generation path remains available when needed, most often for Oracle Support cases or side-by-side snapshot comparison: &lt;code&gt;@$ORACLE_HOME/rdbms/admin/awrrpt.sql&lt;/code&gt; (prompts for report format and begin/end snapshot IDs from &lt;code&gt;DBA_HIST_SNAPSHOT&lt;/code&gt;) or &lt;code&gt;DBMS_WORKLOAD_REPOSITORY.AWR_REPORT_HTML(l_dbid, l_inst_num, l_bid, l_eid)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Two caveats before applying any of this. AWR, ASH, and ADDM require the &lt;a href="https://redresscompliance.com/oracle-diagnostic-pack-and-tuning-pack-experts-guide/" rel="noopener noreferrer"&gt;Oracle Diagnostics Pack&lt;/a&gt; license (bundled with Enterprise Edition); the V$ queries used throughout this guide do not. Oracle 23ai Autonomous Database manages AWR automatically, so the snapshot mechanics above do not apply there.&lt;/p&gt;

&lt;p&gt;From here, these signals feed the triage path the next section lays out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The triage decision framework
&lt;/h2&gt;

&lt;p&gt;Performance triage follows two branches: a first cut on CPU-bound versus wait-heavy, then wait-class routing for the wait-heavy case.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F041skz79qqc9qjjfwnar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F041skz79qqc9qjjfwnar.png" alt="Decision Matrix" width="799" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  From AWR snapshot to action
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/measuring-database-performance.html" rel="noopener noreferrer"&gt;DB Time is the sum of all elapsed time across foreground sessions&lt;/a&gt;; DB CPU is the on-CPU subset. Their ratio is the first triage signal.&lt;/p&gt;

&lt;p&gt;When DB CPU dominates DB Time (above ~75% on OLTP as a starting point, calibrated to your environment), the workload is CPU-bound. SQL tuning or resource contention is the investigation path, and &lt;strong&gt;SQL Ordered by Elapsed Time&lt;/strong&gt; identifies the statements consuming the most DB Time.&lt;/p&gt;

&lt;p&gt;When the share drops well below that, the workload is wait-heavy and &lt;strong&gt;Top 5 Timed Events&lt;/strong&gt; becomes the primary diagnostic surface. Read the &lt;code&gt;%DB Time&lt;/code&gt; column first; raw wait counts mislead. Events consuming a high proportion are the ones to triage, smaller fractions are background noise.&lt;/p&gt;

&lt;p&gt;Cross-reference both views before acting. A &lt;code&gt;db file sequential read&lt;/code&gt; dominant event paired with a top SQL doing millions of single-block reads is a query-specific candidate. The same wait dominant with a simple-lookup top SQL points at a systemic bottleneck (storage, cache pressure) rather than the query itself.&lt;/p&gt;

&lt;p&gt;With the dominant wait class identified, the next subsection routes each class to its investigation pathway and infrastructure check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wait class decision table
&lt;/h3&gt;

&lt;p&gt;Of Oracle 19c's &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/refrn/classes-of-wait-events.html" rel="noopener noreferrer"&gt;13 wait classes&lt;/a&gt;, the table below covers the ones that surface in production OLTP triage. The Idle row is included for diagnostic context, not as a bottleneck.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Wait Class&lt;/th&gt;
&lt;th&gt;Common Events&lt;/th&gt;
&lt;th&gt;Root Cause Pathway&lt;/th&gt;
&lt;th&gt;Infrastructure Check&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System I/O&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;db file sequential read&lt;/code&gt;, &lt;code&gt;db file scattered read&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Index scan latency or full table scan I/O&lt;/td&gt;
&lt;td&gt;Storage IOPS, latency, bandwidth utilization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;library cache lock&lt;/code&gt;, &lt;code&gt;buffer busy waits&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Hard-parse storms, hot segment blocks, DDL contention&lt;/td&gt;
&lt;td&gt;Application deployment timeline, WebLogic thread pool state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;log file sync&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Redo write latency, log writer contention&lt;/td&gt;
&lt;td&gt;Storage throughput on redo log volumes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application&lt;/td&gt;
&lt;td&gt;&lt;code&gt;enq: TX - row lock contention&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Application-level lock design&lt;/td&gt;
&lt;td&gt;Transaction duration in application logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cluster (RAC)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gc buffer busy acquire&lt;/code&gt;, &lt;code&gt;gc cr block 2-way&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Interconnect saturation, cross-instance data block contention&lt;/td&gt;
&lt;td&gt;Private interconnect throughput and latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idle (diagnostic)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SQL*Net message from client&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Application think time, connection pool sizing&lt;/td&gt;
&lt;td&gt;Connection pool metrics, application round-trip count&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The query below produces a class-level wait distribution as triage input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;wait_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;session_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds_in_wait&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_wait_sec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distinct_events&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;SESSION&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'WAITING'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Idle'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="c1"&gt;-- adjust threshold for your concurrency level&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;session_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From here, each dominant class has its own triage pathway, covered in the deep dives that follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait event deep dives by class
&lt;/h2&gt;

&lt;p&gt;Each dive follows the same shape: Oracle wait data first, then the infrastructure check that names the actual root cause. Oracle's &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/descriptions-of-wait-events.html" rel="noopener noreferrer"&gt;wait event descriptions reference&lt;/a&gt; catalogs every event below; the focus here is triage, not definitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  System I/O: physical read events
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;db file sequential read&lt;/code&gt; fires on index range scans, where Oracle reads one block at a time from a specific index entry to its corresponding table block. High wait time with well-tuned execution plans points to storage latency rather than query structure.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;db file scattered read&lt;/code&gt; fires on full table scans, where Oracle reads multiple contiguous blocks in a single I/O. Elevated wait time here means either the full scans are expected (large analytical queries with &lt;code&gt;db_file_multiblock_read_count&lt;/code&gt; tuned for the workload) or missing indexes are forcing full scans where range scans would be more selective.&lt;/p&gt;

&lt;p&gt;Both events can spike from storage subsystem saturation with no change in query execution paths. When System I/O waits climb alongside elevated storage latency on the host, the fix belongs at the infrastructure layer rather than in SQL. OpManager Nexus shortens that diagnostic by surfacing Oracle wait data and host storage metrics on the same timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Concurrency: library cache and hot block contention
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;library cache lock&lt;/code&gt; typically signals hard-parse storms where sessions compete to parse SQL statements that could be shared. First rule out &lt;a href="https://support.oracle.com/knowledge/Oracle%20Database%20Products/2622615_1.html" rel="noopener noreferrer"&gt;DDL operations&lt;/a&gt; (ALTER TABLE, CREATE INDEX); these take an exclusive lock on affected objects and produce the same wait. If no DDL is concurrent, the fix is on the &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/tgsql/improving-rwp-cursor-sharing.html" rel="noopener noreferrer"&gt;&lt;code&gt;cursor_sharing&lt;/code&gt;&lt;/a&gt; side: setting it to &lt;code&gt;FORCE&lt;/code&gt; converts literals to bind variables and reduces hard parses, at the cost of potential plan instability where literal values affect cardinality estimates.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;buffer busy waits&lt;/code&gt; indicate multiple sessions competing to access the same buffer in the cache, often a symptom of hot segment blocks in high-concurrency OLTP workloads. Query &lt;code&gt;V$WAITSTAT&lt;/code&gt; for the block class with the highest counts to determine whether contention is in undo blocks, data blocks, or segment headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;wait_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;wait_time&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;WAITSTAT&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;Concurrency&lt;/code&gt; wait spike immediately after a new application deployment is a regression indicator pointing at code that generates unshared cursors. The same event during a maintenance window is background noise. Application-layer context (deployment timeline, WebLogic thread pool state) resolves the ambiguity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Commit: redo write latency
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;log file sync&lt;/code&gt; fires every time a session issues a COMMIT and waits for the log writer (LGWR) to flush the redo buffer to disk. High &lt;code&gt;log file sync&lt;/code&gt; times correlate directly with redo write latency. If redo log files sit on slow storage or share I/O bandwidth with datafiles, commit-heavy workloads stall here.&lt;/p&gt;

&lt;p&gt;Check redo log placement and storage throughput before tuning application commit frequency. The most direct measurement is the average wait on &lt;code&gt;log file parallel write&lt;/code&gt; from &lt;code&gt;V$SYSTEM_EVENT&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;total_waits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time_waited_micro&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_waits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_wait_ms&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_EVENT&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'log file parallel write'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sustained avg waits above the low-tens-of-milliseconds range indicate a log writer bottleneck or storage constraint on the redo volume. &lt;code&gt;V$SYSTEM_EVENT&lt;/code&gt; values are cumulative since instance startup; read them as deltas (see the V$ metrics reference for the cumulative-vs-delta rule).&lt;/p&gt;

&lt;h3&gt;
  
  
  Application: row lock contention
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;enq: TX - row lock contention&lt;/code&gt; fires when one session holds a row lock and another waits to modify the same row. The blocking pair is visible in &lt;code&gt;V$LOCK&lt;/code&gt; joined with &lt;code&gt;V$SESSION&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;waiting_sid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;waiting_user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lock_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;bk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocking_sid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocking_user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocking_status&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;SESSION&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt; &lt;span class="n"&gt;bk&lt;/span&gt;    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;bk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;
                 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;bk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id1&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id1&lt;/span&gt;
                 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;bk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id2&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id2&lt;/span&gt;
                 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;bk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;SESSION&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;bk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;bk.type = l.type&lt;/code&gt; predicate prevents a TX waiter from being paired with an unrelated TM or UL holder that happens to share &lt;code&gt;id1&lt;/code&gt;/&lt;code&gt;id2&lt;/code&gt;. For simpler diagnostics, &lt;code&gt;V$SESSION.BLOCKING_SESSION&lt;/code&gt; (10g+) returns the blocker's SID directly without the self-join, at the cost of losing the per-lock detail above.&lt;/p&gt;

&lt;p&gt;The fix belongs in the application layer: reduce transaction duration, reorder DML operations to minimize lock hold time, or redesign the data access pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idle: client-side wait events
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;SQL*Net message from client&lt;/code&gt; (classified by Oracle as Idle, not Network) records time Oracle spends waiting for the client application to send the next request. A spike during a deployment window often indicates the new application version is doing more round-trips or holding connections open longer between statements. Seeing this event consume a significant fraction of DB Time (a quarter or more) with no application change is worth investigating as a potential connection leak.&lt;/p&gt;

&lt;p&gt;For network-layer correlation, OpManager Nexus surfaces network device metrics alongside the application and server data already in view. A &lt;code&gt;SQL*Net message from client&lt;/code&gt; spike coinciding with network saturation on the segment connecting application servers to the database tier is a diagnostic that requires both views.&lt;/p&gt;

&lt;p&gt;Beyond individual wait events, the V$ dynamic performance views provide a continuous metrics signal that complements AWR-based triage.&lt;/p&gt;

&lt;h2&gt;
  
  
  V$ metrics reference
&lt;/h2&gt;

&lt;p&gt;The table below consolidates the V$ metrics that correlate with actionable production incidents. Treat the values as starting points and calibrate against your environment's steady-state baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Threshold reference table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Normal Range&lt;/th&gt;
&lt;th&gt;Warning&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;th&gt;V$ Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Buffer Cache Hit Ratio&lt;/td&gt;
&lt;td&gt;Mid-90s % or above&lt;/td&gt;
&lt;td&gt;Drops below the mid-90s&lt;/td&gt;
&lt;td&gt;Below the high-80s&lt;/td&gt;
&lt;td&gt;V$SYSSTAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physical Reads/sec&lt;/td&gt;
&lt;td&gt;Workload baseline&lt;/td&gt;
&lt;td&gt;Elevated above baseline (e.g., 2x)&lt;/td&gt;
&lt;td&gt;Significantly elevated (e.g., 4x), calibrate against observed steady state&lt;/td&gt;
&lt;td&gt;V$SYSSTAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logical Reads/sec&lt;/td&gt;
&lt;td&gt;Workload baseline&lt;/td&gt;
&lt;td&gt;Elevated above baseline (e.g., 3x)&lt;/td&gt;
&lt;td&gt;Significantly elevated (e.g., 5x), calibrate against observed steady state&lt;/td&gt;
&lt;td&gt;V$SYSSTAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active User Sessions&lt;/td&gt;
&lt;td&gt;Operational baseline (commonly &amp;lt; 70% of max as a heuristic)&lt;/td&gt;
&lt;td&gt;Approaching limit (e.g., &amp;gt; 70% of max)&lt;/td&gt;
&lt;td&gt;Near limit (e.g., &amp;gt; 90% of max), calibrate for your environment&lt;/td&gt;
&lt;td&gt;V$SESSION&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB CPU % of DB Time&lt;/td&gt;
&lt;td&gt;High ratio indicates CPU-bound OLTP workload (specific thresholds are practitioner heuristics, calibrate against your steady state)&lt;/td&gt;
&lt;td&gt;Significantly below steady state&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;V$SYSMETRIC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tablespace Used %&lt;/td&gt;
&lt;td&gt;&amp;lt; 80%&lt;/td&gt;
&lt;td&gt;&amp;gt; 80% (&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/spmsu/set-threshold-values-for-tablespace-alerts.html" rel="noopener noreferrer"&gt;Oracle default: 85%&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;&amp;gt; 90% (Oracle default: 97%)&lt;/td&gt;
&lt;td&gt;V$TABLESPACE / DBA_DATA_FILES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ASM Diskgroup Used %&lt;/td&gt;
&lt;td&gt;&amp;lt; 75% (operational heuristic for headroom; Oracle default critical: 90%)&lt;/td&gt;
&lt;td&gt;&amp;gt; 75%&lt;/td&gt;
&lt;td&gt;&amp;gt; 85%&lt;/td&gt;
&lt;td&gt;V$ASM_DISKGROUP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redo Write Latency (&lt;code&gt;log file parallel write&lt;/code&gt; avg)&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms&lt;/td&gt;
&lt;td&gt;Sustained 15-20ms+&lt;/td&gt;
&lt;td&gt;&amp;gt; 50ms (environment-dependent)&lt;/td&gt;
&lt;td&gt;V$SYSTEM_EVENT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parse CPU / Parse Elapsed&lt;/td&gt;
&lt;td&gt;Close to 1.0&lt;/td&gt;
&lt;td&gt;Drops into the 0.7-0.8 range&lt;/td&gt;
&lt;td&gt;&amp;lt; 0.50 (indicative)&lt;/td&gt;
&lt;td&gt;V$SYSSTAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Datafile Status&lt;/td&gt;
&lt;td&gt;ONLINE&lt;/td&gt;
&lt;td&gt;Any OFFLINE&lt;/td&gt;
&lt;td&gt;Any RECOVER&lt;/td&gt;
&lt;td&gt;V$DATAFILE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/refrn/statistics-descriptions-2.html" rel="noopener noreferrer"&gt;Cumulative statistics like physical reads accumulate across the instance lifetime&lt;/a&gt;. AWR captures them as deltas between snapshots (&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_WORKLOAD_REPOSITORY.html" rel="noopener noreferrer"&gt;default interval: 60 minutes, default retention: 8 days&lt;/a&gt; on Oracle 19c). Real-time polling tools calculate these as per-second rates by tracking the delta across consecutive polls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics that need context
&lt;/h3&gt;

&lt;p&gt;When &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/tuning-database-buffer-cache.html" rel="noopener noreferrer"&gt;Buffer Cache Hit Ratio&lt;/a&gt; falls into the warning band, sessions are doing more physical reads than the SGA can absorb; expect correlated spikes in &lt;code&gt;db file sequential read&lt;/code&gt; and &lt;code&gt;db file scattered read&lt;/code&gt;. A very high ratio (above 99%) can mask SQL inefficiency in workloads with small working sets, so the exact inflection point is workload-dependent. Calculate it from the cache-specific counters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;SYSSTAT&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'physical reads cache'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s1"&gt;'db block gets from cache'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="s1"&gt;'consistent gets from cache'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'physical reads cache'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'db block gets from cache'&lt;/span&gt;   &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                 &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'consistent gets from cache'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cache_hit_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The legacy formula using bare &lt;code&gt;physical reads&lt;/code&gt; counts direct-path reads (full table scans, parallel query, large LOB reads) as misses even though those reads bypass the buffer cache entirely; on mixed workloads, that depresses the ratio without indicating a real cache problem.&lt;/p&gt;

&lt;p&gt;Active sessions approaching the &lt;code&gt;sessions&lt;/code&gt; parameter limit (&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/refrn/SESSIONS.html" rel="noopener noreferrer"&gt;Oracle's default formula is &lt;code&gt;1.5 x PROCESSES + 22&lt;/code&gt;&lt;/a&gt;) is a leading indicator of connection pool misconfiguration or a connection leak. Check current utilization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;current_utilization&lt;/span&gt;                                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;curr_sessions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;limit_value&lt;/span&gt;                                         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;limit_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;limit_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'UNLIMITED'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
            &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_utilization&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                       &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit_value&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;END&lt;/span&gt;                                                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;pct_used&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;RESOURCE_LIMIT&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;resource_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sessions'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Physical Reads and Logical Reads should be tracked as rates per second. A 5-minute polling interval catches transient spikes that a 60-minute AWR window would average out.&lt;/p&gt;

&lt;p&gt;Redo Write Latency is covered in the Commit subsection above (query and interpretation).&lt;/p&gt;

&lt;p&gt;Parse CPU / Parse Elapsed is the ratio of CPU time spent parsing to total elapsed parse time. A ratio near 1.0 means parses complete on CPU without waiting; that's the soft-parse signal. When the ratio drops well below 1.0, sessions are waiting on parse latches. No Oracle-documented standard exists for the cutoff, so calibrate against your environment's steady-state baseline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;parse_stats&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;SYSSTAT&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'parse time cpu'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'parse time elapsed'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'parse time cpu'&lt;/span&gt;     &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'parse time elapsed'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;parse_cpu_to_elapsed_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;parse_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;V$SYSSTAT&lt;/code&gt; values for parse time are in centiseconds. Both counters are cumulative since instance startup, so this query returns the lifetime average and will hide a 30-minute parse-latch storm completely. For an actionable real-time signal, capture two readings 5-10 minutes apart and compute the ratio of the deltas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/refrn/V-ASM_DISKGROUP.html" rel="noopener noreferrer"&gt;ASM Statistics&lt;/a&gt; expose &lt;code&gt;TOTAL_MB&lt;/code&gt;, &lt;code&gt;FREE_MB&lt;/code&gt;, and &lt;code&gt;USABLE_FILE_MB&lt;/code&gt; per diskgroup. For mirrored diskgroups, &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/ostmg/capacity-diskgroups.html" rel="noopener noreferrer"&gt;&lt;code&gt;USABLE_FILE_MB&lt;/code&gt;&lt;/a&gt; is the number that matters: a diskgroup showing 30% free by raw space may have far less usable capacity once mirror overhead is factored in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational baseline establishment
&lt;/h2&gt;

&lt;p&gt;Establishing those baselines takes a structured collection period, segmentation by workload window, and percentile-based threshold derivation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Collection period.&lt;/strong&gt; A &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/tgdba/managing-baselines.html" rel="noopener noreferrer"&gt;baseline period&lt;/a&gt; of two to four weeks captures enough variation to account for weekly batch cycles, month-end processing, and workload fluctuations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workload window segmentation.&lt;/strong&gt; OLTP daytime hours and overnight batch windows produce different metric profiles. A buffer cache hit ratio that holds in the mid-90s during OLTP hours can drop significantly during a legitimate batch ETL run that scans large tables. Treat these as separate baselines rather than averaging them together; thresholds set against a blended average miss real anomalies during batch windows and generate false positives during OLTP hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Threshold derivation.&lt;/strong&gt; For metrics like physical reads/sec and active session count where "normal" varies by workload, derive thresholds from observed percentiles. A Warning threshold at the 95th percentile of your baseline period and a Critical threshold at the 99th percentile catches genuine anomalies while tolerating normal variance. The multipliers in the reference table are a reasonable starting point when no baseline data is available yet.&lt;/p&gt;

&lt;p&gt;To calculate percentiles from AWR data (requires Diagnostics Pack):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;PERCENTILE_CONT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;WITHIN&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;PERCENTILE_CONT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;WITHIN&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DBA_HIST_SYSMETRIC_HISTORY&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;metric_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Physical Reads Per Sec'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt;   &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30'&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the same pattern for each metric in your baseline period; filter on &lt;code&gt;end_time&lt;/code&gt; by time-of-day to segment OLTP versus batch windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic vs. static baselines.&lt;/strong&gt; Static baselines work when workload patterns are stable. For environments where workload volume shifts over time (growing user bases, seasonal traffic patterns, migration phases), &lt;a href="https://www.site24x7.com/what-is-database-monitoring.html" rel="noopener noreferrer"&gt;the SaaS delivery of OpManager Nexus&lt;/a&gt; offers AI-driven dynamic thresholds that adjust automatically without manual recalibration.&lt;/p&gt;

&lt;p&gt;With baselines in place, the same calibration logic carries into the capacity monitoring covered next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tablespace and storage capacity
&lt;/h2&gt;

&lt;p&gt;Utilization percentage alone misses the failure mode that actually pages people: growth running into a ceiling between polling cycles. The right primary signal is growth rate against the nearest ceiling, whether that's filesystem, &lt;code&gt;MAXSIZE&lt;/code&gt;, or ASM diskgroup capacity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Permanent tablespace monitoring
&lt;/h3&gt;

&lt;p&gt;Autoextend is the silent-failure mode. A tablespace with autoextend enabled will &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/admin/managing-data-files-and-temp-files.html" rel="noopener noreferrer"&gt;consume disk space&lt;/a&gt; until either the filesystem fills, the tablespace hits its &lt;code&gt;MAXSIZE&lt;/code&gt; limit, or the underlying ASM diskgroup runs out of usable capacity. When &lt;code&gt;MAXSIZE&lt;/code&gt; is set to &lt;code&gt;UNLIMITED&lt;/code&gt; (indicated by &lt;code&gt;MAXBYTES=0&lt;/code&gt; in &lt;code&gt;DBA_DATA_FILES&lt;/code&gt;), Oracle grows the datafile until the filesystem is full or the smallfile/bigfile platform maximum is reached (whichever comes first), with no Oracle-space threshold alert at the datafile level. By the time an &lt;a href="https://docs.oracle.com/en/error-help/db/ora-01653/" rel="noopener noreferrer"&gt;ORA-01653 (unable to extend table)&lt;/a&gt; or &lt;a href="https://docs.oracle.com/en/error-help/db/ora-01688/" rel="noopener noreferrer"&gt;ORA-01688 (unable to extend table partition)&lt;/a&gt; error appears in the alert log, sessions have already failed.&lt;/p&gt;

&lt;p&gt;This query exposes remaining capacity per tablespace, not just current consumption:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_size&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_gb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;used_space&lt;/span&gt;      &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;used_gb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_size&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;used_space&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block_size&lt;/span&gt;
        &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;remaining_gb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;used_percent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;used_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;used_percent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;remaining_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DBA_TABLESPACE_USAGE_METRICS&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;DBA_TABLESPACES&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_name&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;used_percent&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Track the growth trend (how many gigabytes per day a tablespace is growing) and alert before utilization reaches the autoextend ceiling. For licensed environments, calculate daily growth from AWR history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tablespace_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_usedsize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_usedsize&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;
    &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_interval_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_interval_time&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_growth_gb_per_day&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DBA_HIST_TBSPC_SPACE_USAGE&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;DBA_HIST_SNAPSHOT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt;  &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snap_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snap_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbid&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;TABLESPACE&lt;/span&gt;    &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;DBA_TABLESPACES&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tablespace_name&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_interval_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;SYSDATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block_size&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;avg_growth_gb_per_day&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="n"&gt;NULLS&lt;/span&gt; &lt;span class="k"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without the Diagnostics Pack, capture periodic snapshots of &lt;code&gt;DBA_TABLESPACE_USAGE_METRICS&lt;/code&gt; to an external tracking table on a scheduled basis and compute deltas between rows.&lt;/p&gt;

&lt;p&gt;To identify datafiles with unbounded growth potential:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tablespace_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;current_size_gb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;maxbytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'UNLIMITED'&lt;/span&gt;
       &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;TO_CHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxbytes&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt;                                                  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_size_gb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;maxbytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
       &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;maxbytes&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1073741824&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt;                                                  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;growth_headroom_gb&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DBA_DATA_FILES&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;autoextensible&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'YES'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;tablespace_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;MAXBYTES = 0&lt;/code&gt; indicates no explicit autoextend ceiling rather than literally unlimited capacity; Oracle still bounds the datafile at the smallfile-versus-bigfile platform maximum (~32 GB and ~32 TB respectively at 8 KB blocks), so factor that ceiling into capacity planning rather than treating "UNLIMITED" as truly unbounded.&lt;/p&gt;

&lt;h3&gt;
  
  
  TEMP tablespace monitoring
&lt;/h3&gt;

&lt;p&gt;TEMP tablespace exhaustion (&lt;code&gt;ORA-01652: unable to extend temp segment&lt;/code&gt;) is a frequent production incident. Large sort operations, hash joins, and global temporary table usage can exhaust TEMP space without advance warning. Set a warning threshold at 75-80% of configured TEMP size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;tablespace_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tablespace_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1048576&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;temp_total_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;allocated_space&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1048576&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;temp_allocated_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;free_space&lt;/span&gt;      &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1048576&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;temp_free_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;tablespace_size&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;free_space&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
             &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tablespace_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;temp_used_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DBA_TEMP_FREE_SPACE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;V$TEMP_SPACE_HEADER&lt;/code&gt; reports the allocation high-water mark from the tempfile header bitmap and does not reflect reclaimable extents. Oracle's lazy reclamation means freed sort segments stay marked used until a tempfile shrink or instance restart, so a query against it tends to alarm on healthy systems. &lt;code&gt;DBA_TEMP_FREE_SPACE&lt;/code&gt; accounts for free-but-not-released space and is the right source for capacity thresholds; for currently-active sort/hash usage during an incident, &lt;code&gt;V$SORT_SEGMENT&lt;/code&gt; or &lt;code&gt;V$TEMPSEG_USAGE&lt;/code&gt; shows what individual sessions are holding.&lt;/p&gt;

&lt;p&gt;Tablespace growth changes slowly enough that polling every 30-60 minutes is sufficient for most environments.&lt;/p&gt;

&lt;p&gt;From single-instance scope, the next section adds the RAC and Multitenant scoping rules that change how the V$ queries above return data.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAC and multitenant monitoring
&lt;/h2&gt;

&lt;p&gt;In RAC environments, the V$ queries shown throughout this guide return data for the local instance only. Use GV$ views (e.g., &lt;code&gt;GV$SESSION&lt;/code&gt;, &lt;code&gt;GV$WAITSTAT&lt;/code&gt;) and filter by &lt;code&gt;INST_ID&lt;/code&gt; to query across all nodes. For single-instance diagnostics on a specific RAC node, V$ queries remain correct but will not reflect wait events or sessions active on other nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster wait class events.&lt;/strong&gt; &lt;code&gt;gc buffer busy acquire&lt;/code&gt; and &lt;code&gt;gc cr block 2-way&lt;/code&gt; indicate sessions waiting for blocks to transfer across the private interconnect between RAC nodes. Elevated wait times point to interconnect saturation or cross-instance contention for the same data blocks. Check private interconnect throughput and consider partitioning or rebalancing workloads across nodes to reduce inter-node block shipping. OpManager Nexus surfaces these RAC metrics alongside node state and ASM diskgroup capacity in one view.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CDB/PDB V$ view scoping.&lt;/strong&gt; In Oracle 19c &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/multi/viewing-information-about-cdbs-and-pdbs-with-sql-plus.html" rel="noopener noreferrer"&gt;Multitenant architecture&lt;/a&gt; (non-CDB was deprecated in 21c), &lt;code&gt;V$SESSION&lt;/code&gt;, &lt;code&gt;V$SYSSTAT&lt;/code&gt;, and tablespace views return data scoped to the current container by default. Monitoring at the CDB root level shows aggregate metrics across all PDBs. For per-PDB visibility, configure a separate monitor for each PDB or use &lt;code&gt;CON_ID&lt;/code&gt; filtering in your queries. For example, to scope the session wait query to a specific PDB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;session_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds_in_wait&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_wait_sec&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;SESSION&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'WAITING'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Idle'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;con_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;con_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;PDBS&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'YOUR_PDB_NAME'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;wait_class&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;session_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OpManager Nexus supports automatic PDB discovery (set Discover Pluggable Database to Yes during monitor creation).&lt;/p&gt;

&lt;p&gt;Once the monitor is collecting from the right scope, the next section covers what to do when the values cross threshold.&lt;/p&gt;

&lt;h2&gt;
  
  
  Threshold configuration and alert routing
&lt;/h2&gt;

&lt;p&gt;Group-level or overall health alerts collapse multiple attribute states into a single signal; the result is alert noise and ambiguous routing. Per-attribute thresholds are the right unit for Oracle environments, where buffer cache hit ratio, tablespace utilization, physical reads, and wait event states each warrant a different response. OpManager Nexus supports this model directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting per-attribute thresholds
&lt;/h3&gt;

&lt;p&gt;OpManager Nexus uses four severity states for Oracle monitor attributes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Critical&lt;/strong&gt;: a confirmed issue requiring immediate action&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Warning&lt;/strong&gt;: a potential issue that warrants attention but has not yet caused operational impact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear&lt;/strong&gt;: a previously triggered condition that has resolved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unknown&lt;/strong&gt;: displayed when the attribute value does not match any configured severity condition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the reference table from the V$ Metrics section as your starting point for per-attribute thresholds. Oracle's own &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/spmsu/set-threshold-values-for-tablespace-alerts.html" rel="noopener noreferrer"&gt;tablespace defaults (85%/97%)&lt;/a&gt; and ASM defaults (75%/90%) are more permissive than the table's recommendations, so decide whether your environment can tolerate that extra margin. Physical reads and session count thresholds require a baseline period before they are meaningful (see the Operational baseline establishment section).&lt;/p&gt;

&lt;p&gt;Tablespace statistics collection is configured at Settings &amp;gt; Performance Polling &amp;gt; Optimize Data Collection in OpManager Nexus. Select Oracle from the Monitor Type dropdown, then TableSpace Statistics from the metric dropdown. Two scheduling options: "Collect data in every polling" runs tablespace collection on every poll cycle (appropriate for high-growth OLTP environments); "Collect data at customized time interval" schedules collection at a fixed time (sufficient for stable OLAP or data warehouse tablespaces with predictable growth).&lt;/p&gt;

&lt;p&gt;Alert log collection follows the same path: Settings &amp;gt; Performance Polling &amp;gt; Optimize Data Collection, then select Oracle Alert Log from the metric dropdown. OpManager Nexus collects alert log entries on each poll and stores alert log history for a configurable retention period, which is useful for correlating a metrics anomaly with a specific Oracle error. You can suppress specific error patterns that are known-benign in your environment by entering them in the Errors to Ignore field under Settings &amp;gt; Performance Polling &amp;gt; Database Servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alert log monitoring
&lt;/h3&gt;

&lt;p&gt;The Oracle Alert Log surfaces errors that no metric will catch on its own. &lt;a href="https://docs.oracle.com/en/engineered-systems/health-diagnostics/autonomous-health-framework/ahfug/ora-00600.html" rel="noopener noreferrer"&gt;ORA-600&lt;/a&gt; (internal errors that typically warrant Oracle Support involvement), &lt;a href="https://docs.oracle.com/en/engineered-systems/health-diagnostics/autonomous-health-framework/ahfug/ora-04031.html" rel="noopener noreferrer"&gt;ORA-4031&lt;/a&gt; (shared pool memory exhaustion, which can trigger cascading parse failures), ORA-27xxx (I/O and OS errors), and media recovery events indicating datafile or redo log corruption are all signals that surface in the alert log before they appear in V$ metrics.&lt;/p&gt;

&lt;p&gt;Configure thresholds at the individual error pattern level where possible. ORA-600 and ORA-4031 warrant Critical severity and immediate escalation. ORA-12514 (TNS listener errors) may warrant Warning severity during maintenance windows but Critical at other times.&lt;/p&gt;

&lt;h3&gt;
  
  
  Webhook and incident management integration
&lt;/h3&gt;

&lt;p&gt;For routing alerts to your incident management platform, configure a webhook action. In OpManager Nexus, go to Admin &amp;gt; Alarm/Action &amp;gt; Actions and create a RestAPI Action. Provide your incident platform's webhook URL, set the form submission method to POST, and configure a JSON payload using OpManager Nexus's replaceable tags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$MONITORNAME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$HOSTNAME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"attribute"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$ATTRIBUTE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$SEVERITY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$ATTRIBUTEVALUE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$RCAMSG_PLAINTEXT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$STRMODIFIEDTIME"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In OpManager Nexus's webhook configuration UI, tags are entered without backslashes (e.g., &lt;code&gt;$MONITORNAME&lt;/code&gt;, &lt;code&gt;$SEVERITY&lt;/code&gt;). The backslashes shown in some documentation sources are a rendering artifact.&lt;/p&gt;

&lt;p&gt;This payload includes &lt;code&gt;$SEVERITY&lt;/code&gt;, which passes the current alarm severity (Critical, Warning, or Clear) to the receiving system. When paired with a Clear event, this enables auto-resolution of tickets in any incident platform that accepts incoming webhooks.&lt;/p&gt;

&lt;p&gt;The ServiceDesk Plus integration creates tickets automatically when threshold conditions are met and resolves them when the alarm clears. It uses a dedicated REST API integration rather than the generic webhook action, but the auto-create/auto-close behavior is the same.&lt;/p&gt;

&lt;p&gt;With thresholds and routing handled, the closing section walks through the monitor setup that puts them into effect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitor setup and initial configuration
&lt;/h2&gt;

&lt;p&gt;To add your first Oracle Database monitor in OpManager Nexus, go to New Monitor and select Oracle DB Server under Database Servers. Enter the host IP or hostname, port, username, and a valid SID or host connection string, then set your polling interval.&lt;/p&gt;

&lt;p&gt;The monitoring user requires at minimum: CONNECT privilege, &lt;code&gt;SELECT_CATALOG_ROLE&lt;/code&gt; (covers &lt;code&gt;DBA_*&lt;/code&gt; views and most &lt;code&gt;V$&lt;/code&gt; views in 19c), and explicit grants on the underlying &lt;code&gt;V_$&lt;/code&gt; tables for any tooling that runs without role inheritance (definer's-rights stored procedures, for example, where roles are disabled at execution):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;V_&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;SESSION&lt;/span&gt;         &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;monitor_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;V_&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;SYSSTAT&lt;/span&gt;         &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;monitor_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;V_&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;SYSMETRIC&lt;/span&gt;       &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;monitor_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;V_&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_EVENT&lt;/span&gt;    &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;monitor_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;V_&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;WAITSTAT&lt;/span&gt;        &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;monitor_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;V_&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;RESOURCE_LIMIT&lt;/span&gt;  &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;monitor_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The list above is representative rather than exhaustive; other queries in this guide also touch &lt;code&gt;V$LOCK&lt;/code&gt;, &lt;code&gt;V$ACTIVE_SESSION_HISTORY&lt;/code&gt;, &lt;code&gt;V$TABLESPACE&lt;/code&gt;, and &lt;code&gt;V$PDBS&lt;/code&gt;, which &lt;code&gt;SELECT_CATALOG_ROLE&lt;/code&gt; already covers in role-aware contexts. Add the corresponding &lt;code&gt;V_$&lt;/code&gt; grants if your tooling cannot inherit role privileges, and &lt;code&gt;GRANT SELECT ON V_$TEMP_SPACE_HEADER TO monitor_user;&lt;/code&gt; if you retain the legacy TEMP query for ad-hoc debugging.&lt;/p&gt;

&lt;p&gt;For Multitenant environments, set Discover Pluggable Database to Yes to enumerate PDBs automatically. For RAC, select Oracle RAC Server instead of Oracle DB Server during monitor creation, provide either the Scan Host Name or the SCAN IP, and grant &lt;code&gt;GV_$&lt;/code&gt; equivalents to the monitoring user. OpManager Nexus's documentation lists the full grant set for its specific Oracle monitor implementation.&lt;/p&gt;

&lt;p&gt;After the monitor is active:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enable TableSpace Statistics and Oracle Alert Log collection (Settings &amp;gt; Performance Polling &amp;gt; Optimize Data Collection)&lt;/li&gt;
&lt;li&gt;Using the V$ metrics reference table as your starting point, configure per-attribute thresholds for buffer cache hit ratio, tablespace utilization, and physical reads&lt;/li&gt;
&lt;li&gt;Set up webhook or ServiceDesk Plus integration for alert routing&lt;/li&gt;
&lt;li&gt;Collect baseline data for a sufficient period (typically two to four weeks) before tightening multiplier-based thresholds for physical reads and session count&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For polling cadence: V$ metrics benefit from sub-AWR-interval polling, tablespace statistics tolerate lower-frequency intervals, and the alert log is best polled on every cycle so errors are captured within the polling window.&lt;/p&gt;

&lt;p&gt;The triage framework in this guide (from DB Time ratio to wait class routing to V$ metric thresholds) gives you a repeatable path from symptom to corrective action. OpManager Nexus places Oracle events on the same timeline as the surrounding stack across on-prem and SaaS deployments, which cuts the context-switching that extends incident resolution time.&lt;/p&gt;

</description>
      <category>backend</category>
      <category>database</category>
      <category>monitoring</category>
      <category>performance</category>
    </item>
    <item>
      <title>GPU cloud servers for AI workloads: how to choose the right instance and deploy without waste</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Thu, 07 May 2026 11:56:59 +0000</pubDate>
      <link>https://dev.to/damasosanoja/gpu-cloud-servers-for-ai-workloads-how-to-choose-the-right-instance-and-deploy-without-waste-54j7</link>
      <guid>https://dev.to/damasosanoja/gpu-cloud-servers-for-ai-workloads-how-to-choose-the-right-instance-and-deploy-without-waste-54j7</guid>
      <description>&lt;p&gt;Your team just hit VRAM OOM during a demo prep run. The A100 40GB you provisioned for a Llama-3-70B deployment looked fine on paper until the KV cache ballooned at 8K context. You could throw two H100s at it and move on, or you could run the 30 seconds of arithmetic you skipped before provisioning.&lt;/p&gt;

&lt;p&gt;Four decisions separate teams that run GPUs above 70% utilization from those idling at 35% while paying full price: workload classification, VRAM calculation, instance selection, and pricing model alignment. Get any of them wrong, and you’ll either hit a production ceiling or burn budget on capacity you can’t fill. Once all four are locked in, deployment is the execution step that wires them together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with your workload class, not the GPU spec sheet
&lt;/h2&gt;

&lt;p&gt;Workload classification comes first because training, fine-tuning, and inference each leave a different compute signature on the hardware, and that signature is what tells you which GPU to rent. The same &lt;a href="https://huggingface.co/meta-llama/Meta-Llama-3-70B" rel="noopener noreferrer"&gt;Llama-3-70B model&lt;/a&gt; behaves like three different problems depending on what you’re doing with it, and the cheapest viable instance changes accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full training&lt;/strong&gt; is the heaviest of the three because every parameter is in motion at once. Your GPU spends most of its time executing &lt;a href="https://docs.nvidia.com/doca/archive/doca-v1.3/Allreduce/index.html" rel="noopener noreferrer"&gt;Allreduce&lt;/a&gt; across data-parallel replicas and shuttling optimizer states between High-Bandwidth Memory &lt;a href="https://newsroom.lamresearch.com/high-bandwidth-memory-explained-semi-101?blog=true" rel="noopener noreferrer"&gt;(HBM)&lt;/a&gt; and compute units, sustained over hours or days. The memory cost compounds quickly: a model trained with &lt;a href="https://arxiv.org/abs/1711.05101" rel="noopener noreferrer"&gt;AdamW&lt;/a&gt; in mixed precision stores weights, gradients, first moments, and second moments, totaling 16-18 bytes per parameter depending on whether gradients are kept in FP16 or FP32. That’s why memory capacity caps your maximum batch size per device and memory bandwidth caps how fast weight updates land, and it’s also why most teams running on cloud GPUs avoid full training whenever a cheaper path exists.&lt;/p&gt;

&lt;p&gt;That cheaper path is usually &lt;strong&gt;fine-tuning with&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2106.09685" rel="noopener noreferrer"&gt;&lt;strong&gt;LoRA&lt;/strong&gt;&lt;/a&gt;, which keeps most of the base model out of the optimizer entirely. By freezing the base weights and training only low-rank decomposition matrices, LoRA collapses the parameter count that AdamW has to track: with rank=16 on &lt;a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B" rel="noopener noreferrer"&gt;Llama 3 8B&lt;/a&gt;, you’re training roughly 42 million parameters instead of 8 billion. The base model stays in BF16 (or FP16) on-device, the adapters themselves are negligible in size, and optimizer states only cover the trainable slice, which drops total VRAM to around 20GB for an 8B model. That’s a footprint a single A100 80GB can hold with room left for forward-pass activations, turning a multi-GPU job into a single-card one. Runpod’s &lt;a href="https://www.runpod.io/blog/llm-fine-tuning-gpu-guide" rel="noopener noreferrer"&gt;LLM fine-tuning GPU guide&lt;/a&gt; covers this workload class in depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt; flips the constraint again, because once training is done, the optimizer disappears and the bottleneck moves from capacity to bandwidth. The shape of that bottleneck depends on how you serve: batch inference maximizes throughput per dollar by packing more sequences into each forward pass and tolerating the latency needed to fill the batch, while real-time inference targets TTFT (time-to-first-token), which is FLOPS-limited during the prefill phase. Once prefill finishes, though, the workload changes character: the model enters the decode phase, where it generates one token at a time and inter-token latency scales with how fast the GPU can stream the KV cache off HBM. That’s the regime where memory bandwidth, not raw compute, sets the ceiling, and it’s why an &lt;a href="https://www.nvidia.com/en-us/data-center/h100/" rel="noopener noreferrer"&gt;H100 SXM’s 3.35 TB/s HBM3 bandwidth&lt;/a&gt; serves tokens faster than an A100’s 2.0 TB/s, with the gap widening as the KV cache grows with sequence length and batch size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data modality&lt;/strong&gt; then layers a second axis on top of those three signatures, because the workload class tells you what’s happening on the GPU but not what’s filling its memory, and modalities fill memory very differently. LLMs concentrate the pressure on context length: they’re &lt;a href="https://frankdenneman.nl/posts/2026-01-12-the-dynamic-world-of-llm-runtime-memory/" rel="noopener noreferrer"&gt;KV-cache-bound, with VRAM scaling against the number of tokens in flight&lt;/a&gt;, so an 8B model serving 32K-token sessions can need more memory than the same 8B model serving 2K-token chats. Diffusion models like SDXL push on the opposite lever, staying modest in parameter count (&lt;a href="https://arxiv.org/abs/2307.01952" rel="noopener noreferrer"&gt;the SDXL base model sits at approximately 3.5B parameters&lt;/a&gt; across &lt;a href="https://compvis.github.io/vunet/images/vunet.pdf" rel="noopener noreferrer"&gt;UNet and VAE&lt;/a&gt;, with the refiner adding 6.6B for the full pipeline) but ballooning with image resolution and batch size as the latent activations grow. Multimodal models like &lt;a href="https://llava-vl.github.io/" rel="noopener noreferrer"&gt;LLaVA&lt;/a&gt; sit at the intersection of those two pressures and pay both costs: the vision encoder produces image embeddings that inflate the effective sequence length before the language model ever sees the input, so the KV cache starts larger than a text prompt of the same nominal length would suggest, and you’ll hit VRAM limits at batch sizes that would serve a same-size pure-LLM without complaint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calculate your VRAM before you provision
&lt;/h2&gt;

&lt;p&gt;Once you know your workload class and modality, the next question is how much memory the job actually needs, and that turns into a short arithmetic exercise before any instance gets provisioned. The inference VRAM formula is:&lt;/p&gt;

&lt;p&gt;VRAM = (N_params x bytes_per_param) + KV_cache_size + framework overhead (10-15%)&lt;/p&gt;

&lt;p&gt;The KV cache size formula is:&lt;/p&gt;

&lt;p&gt;KV_cache_size = 2 x num_layers x num_heads x head_dim x seq_len x batch_size x bytes_per_element&lt;/p&gt;

&lt;p&gt;Note that num_heads for GQA models refers to the KV head count, not the query head count (e.g., 8 for Llama-3-70B, not 64). You can find num_layers, num_heads (as num_key_value_heads), and head_dim in the model’s config.json on HuggingFace Hub.&lt;/p&gt;

&lt;p&gt;Example for Llama-3-70B at 4K context, batch size 8:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weights at BF16: 70B x 2 bytes = 140GB&lt;/li&gt;
&lt;li&gt;Weights at INT4 via &lt;a href="https://github.com/bitsandbytes-foundation/bitsandbytes" rel="noopener noreferrer"&gt;bitsandbytes&lt;/a&gt;: 70B x 0.5 bytes = 35GB&lt;/li&gt;
&lt;li&gt;KV cache at BF16: 2 x 80 layers x 8 KV heads x 128 head_dim x 4096 tokens x 8 batch x 2 bytes = approximately 10.7GB&lt;/li&gt;
&lt;li&gt;Framework overhead at BF16: 140GB x 0.12 = approximately 17GB&lt;/li&gt;
&lt;li&gt;Total at BF16: approximately 168GB (requires 2x H100 80GB or more with tensor parallelism)&lt;/li&gt;
&lt;li&gt;Total at INT4: approximately 35GB + 10.7GB KV cache + 5GB overhead = approximately 51GB (fits one A100 80GB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The table below gives you the minimum per-precision VRAM numbers for LLM inference. All values include approximately 12% framework overhead. KV cache is excluded because it varies with sequence length and batch size, so add 2-10GB for typical serving configurations, or significantly more for long-context (8K+) or high-concurrency deployments.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Model Size&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;FP16/BF16&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;INT8&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;INT4&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Min Instance (FP16)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Min Instance (INT4)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;~18GB&lt;/td&gt;
&lt;td&gt;~9GB&lt;/td&gt;
&lt;td&gt;~5GB&lt;/td&gt;
&lt;td&gt;A100 40GB&lt;/td&gt;
&lt;td&gt;RTX 4090 24GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13B&lt;/td&gt;
&lt;td&gt;~29GB&lt;/td&gt;
&lt;td&gt;~14GB&lt;/td&gt;
&lt;td&gt;~8GB&lt;/td&gt;
&lt;td&gt;A100 40GB&lt;/td&gt;
&lt;td&gt;RTX 4090 24GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;34B&lt;/td&gt;
&lt;td&gt;~76GB&lt;/td&gt;
&lt;td&gt;~38GB&lt;/td&gt;
&lt;td&gt;~19GB&lt;/td&gt;
&lt;td&gt;A100 80GB&lt;/td&gt;
&lt;td&gt;A100 40GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;70B&lt;/td&gt;
&lt;td&gt;~157GB&lt;/td&gt;
&lt;td&gt;~78GB&lt;/td&gt;
&lt;td&gt;~40GB&lt;/td&gt;
&lt;td&gt;2x A100 80GB&lt;/td&gt;
&lt;td&gt;A100 80GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;These values cover inference weight loading only.&lt;/em&gt; If you’re fine-tuning instead, the numbers shift: &lt;strong&gt;full AdamW mixed-precision training multiplies FP16 weight VRAM by 8x&lt;/strong&gt;, while LoRA at rank=16 adds only about 4GB of combined overhead (activations, intermediate gradients, and optimizer states) on top of the frozen base model. Adjusting rank scales that overhead roughly linearly: rank=8 halves it with some quality cost, rank=32 doubles it for more expressivity.&lt;/p&gt;

&lt;p&gt;Here’s where that 8x multiplier comes from. AdamW in mixed precision stores five components per parameter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 bytes (FP16 weights)&lt;/li&gt;
&lt;li&gt;2 bytes (FP16 gradients)&lt;/li&gt;
&lt;li&gt;4 bytes (FP32 master weights)&lt;/li&gt;
&lt;li&gt;4 bytes (FP32 first moment)&lt;/li&gt;
&lt;li&gt;4 bytes (FP32 second moment)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That totals 16 bytes per parameter (18 bytes if your implementation keeps FP32 gradients separately). For an 8B model: 8B x 16 = 128GB minimum, which exceeds a single A100 80GB. This is exactly why LoRA’s reduction to approximately 42M trainable parameters at rank=16 on the same 8B model makes single-GPU fine-tuning viable.&lt;/p&gt;

&lt;p&gt;With your VRAM requirements calculated, the next step is matching them to actual hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Match the GPU architecture to your workload class
&lt;/h2&gt;

&lt;p&gt;A VRAM number on its own only tells you what fits, not what serves well, and two GPUs with the same 80GB sticker can give you very different throughput on the same model. Hardware specs differ enough across current GPU options that a poor choice creates production constraints you can’t optimize away later, so the next move is matching the workload signature from the first section to the architecture that actually runs it efficiently.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;VRAM&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Memory BW&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;BF16 TFLOPS&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Multi-GPU Link&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Ideal Workload&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;H100 SXM 80GB&lt;/td&gt;
&lt;td&gt;80GB HBM3&lt;/td&gt;
&lt;td&gt;3.35 TB/s&lt;/td&gt;
&lt;td&gt;989&lt;/td&gt;
&lt;td&gt;NVLink 4.0 (900 GB/s)&lt;/td&gt;
&lt;td&gt;Large model training, high-concurrency inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A100 80GB SXM&lt;/td&gt;
&lt;td&gt;80GB HBM2e&lt;/td&gt;
&lt;td&gt;2.0 TB/s&lt;/td&gt;
&lt;td&gt;~312&lt;/td&gt;
&lt;td&gt;NVLink 3.0 (600 GB/s)&lt;/td&gt;
&lt;td&gt;Multi-GPU training, 34B+ inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A100 80GB PCIe&lt;/td&gt;
&lt;td&gt;80GB HBM2e&lt;/td&gt;
&lt;td&gt;1.94 TB/s&lt;/td&gt;
&lt;td&gt;~312&lt;/td&gt;
&lt;td&gt;PCIe 4.0 (64 GB/s)&lt;/td&gt;
&lt;td&gt;Single-card inference, LoRA fine-tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L40S 48GB&lt;/td&gt;
&lt;td&gt;48GB GDDR6&lt;/td&gt;
&lt;td&gt;864 GB/s&lt;/td&gt;
&lt;td&gt;~362&lt;/td&gt;
&lt;td&gt;PCIe 4.0 (64 GB/s)&lt;/td&gt;
&lt;td&gt;Diffusion + LLM combo inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090 24GB&lt;/td&gt;
&lt;td&gt;24GB GDDR6X&lt;/td&gt;
&lt;td&gt;1.0 TB/s&lt;/td&gt;
&lt;td&gt;~82.6&lt;/td&gt;
&lt;td&gt;PCIe 4.0 (64 GB/s)&lt;/td&gt;
&lt;td&gt;Prototyping, quantized 7B-13B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AMD MI300X&lt;/td&gt;
&lt;td&gt;192GB HBM3&lt;/td&gt;
&lt;td&gt;5.3 TB/s&lt;/td&gt;
&lt;td&gt;~1307&lt;/td&gt;
&lt;td&gt;Infinity Fabric (XGMI)&lt;/td&gt;
&lt;td&gt;70B+ BF16 single-card serving&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Start at the top of the table. The H100 SXM 80GB earns its price premium on any workload where inter-GPU communication, not raw compute, is what would otherwise constrain you: &lt;a href="https://www.nvidia.com/en-us/data-center/nvlink/" rel="noopener noreferrer"&gt;NVLink 4.0&lt;/a&gt; delivers 900 GB/s bidirectional bandwidth within a node, roughly 14x PCIe 4.0, which translates to substantially faster Allreduce across eight GPUs. The math becomes concrete on a 70B tensor-parallel deployment across four H100s, where every forward pass exchanges activation tensors at layer boundaries across cards via all-reduce. NVLink absorbs that traffic; PCIe 4.0 at 64 GB/s turns it into the bottleneck.&lt;/p&gt;

&lt;p&gt;If your job doesn’t need that interconnect, the A100 80GB is usually the right step down, and the choice between its two variants follows directly from the same bandwidth question. The PCIe variant delivers 1.94 TB/s of memory bandwidth versus the SXM’s 2.04 TB/s, close enough on a single card that memory-bound serving sees only marginal differences, so the PCIe variant runs 20-30% cheaper and fits single-card inference up to 34B at INT8 and LoRA fine-tuning of 8B-13B models. The SXM premium only pays off once you scale across cards, where NVLink 3.0 (600 GB/s) provides a 9.4x bandwidth advantage over PCIe 4.0 for tensor-parallel and Allreduce traffic.&lt;/p&gt;

&lt;p&gt;The L40S sits one tier below the A100 on memory bandwidth and one tier above on rendering silicon, which gives it a narrower but real niche. Its GDDR6 memory tops out at 864 GB/s, putting raw LLM inference throughput below an A100 80GB on memory-bound workloads, but the Ada Lovelace rasterization silicon makes it the right pick for mixed pipelines that combine image generation (&lt;a href="https://github.com/comfyanonymous/ComfyUI" rel="noopener noreferrer"&gt;ComfyUI&lt;/a&gt;, SDXL) with LLM text generation. It fits SDXL at full resolution alongside a 34B LLM in INT4 at a cost-per-hour that’s competitive for that specific combination.&lt;/p&gt;

&lt;p&gt;Below the L40S, the RTX 4090 24GB belongs in a different category entirely: prototyping, not production. At INT4 via bitsandbytes, it serves a quantized 13B model with meaningful throughput, but the 24GB VRAM ceiling and &lt;a href="https://www.nvidia.com/en-us/drivers/geforce-license/" rel="noopener noreferrer"&gt;NVIDIA EULA restrictions on datacenter use of GeForce GPUs&lt;/a&gt; keep it in the development and quantization-testing tier. Graduate to an A100 80GB once the workload moves to production.&lt;/p&gt;

&lt;p&gt;The AMD MI300X is the outlier in this lineup, and its case is narrow but compelling: a single card running Llama-3-70B in BF16. The &lt;a href="https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/data-sheets/amd-instinct-mi300x-data-sheet.pdf" rel="noopener noreferrer"&gt;192GB HBM3&lt;/a&gt; pool fits the full model with room for a usable KV cache, removing the complexity of a 4-GPU tensor-parallel setup, and &lt;a href="https://www.runpod.io/blog/mi300x-vs-h100-mixtral" rel="noopener noreferrer"&gt;Runpod’s MI300X vs H100 benchmark on Mixtral&lt;/a&gt; shows where that memory advantage translates into real throughput gains. The catch is the software side: &lt;a href="https://rocm.docs.amd.com/en/latest/" rel="noopener noreferrer"&gt;ROCm 6+&lt;/a&gt; has made PyTorch workable for standard training and inference, and ROCm became a first-class platform in vLLM as of early 2026 with prebuilt wheels, but custom CUDA extensions, Flash Attention variants, and Triton kernels still need to be checked against the ROCm HIP compatibility table and the &lt;a href="https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#amd-rocm" rel="noopener noreferrer"&gt;vLLM ROCm compatibility matrix&lt;/a&gt; before you commit, and tested on an actual MI300X instance before production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Networking: when interconnect becomes the bottleneck
&lt;/h3&gt;

&lt;p&gt;The NVLink 4.0 vs PCIe 4.0 gap covered above is the &lt;em&gt;within-node&lt;/em&gt; story; it’s only half of the interconnect picture once you scale beyond one chassis. The other half is what happens between nodes, and the two scales fail in different ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Within a single node&lt;/strong&gt;, the parallelism strategy decides how much that NVLink-vs-PCIe gap actually costs you. Tensor-parallel inference exchanges activations across all GPUs on every forward pass and is exquisitely sensitive to the gap, which is why H100 SXM nodes exist. Pipeline-parallel inference, by contrast, hands a single activation tensor from one stage to the next in one direction, so PCIe 4.0 is often adequate, and the SXM premium stops paying for itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Across nodes&lt;/strong&gt;, the relevant comparison is &lt;a href="https://www.nvidia.com/en-us/networking/products/infiniband/" rel="noopener noreferrer"&gt;InfiniBand NDR at 400 Gb/s&lt;/a&gt; vs 100GbE Ethernet, and the cost shows up in synchronous data-parallel training where &lt;a href="https://pytorch.org/tutorials/intermediate/dist_tuto.html" rel="noopener noreferrer"&gt;Allreduce&lt;/a&gt; gradient sync scales with model size and node count. A 70B run with 2-byte gradients moves 140GB per Allreduce step: roughly 11 seconds over 100GbE, under 3 seconds over InfiniBand NDR, and the Ethernet penalty grows with each node added. The practical heuristic: if your model fits on a single node for inference or LoRA fine-tuning (4x A100 80GB = 320GB covers 70B inference at BF16 with room for KV cache, or LoRA fine-tuning of the same model), stay there. Cross-node setup adds operational complexity that only memory constraints can justify.&lt;/p&gt;

&lt;p&gt;One footgun lives below both of those layers. &lt;a href="https://developer.nvidia.com/nccl" rel="noopener noreferrer"&gt;NCCL&lt;/a&gt; silently falls back to CPU-mediated transfers when direct GPU P2P isn’t available, cutting Allreduce throughput 30-40% versus correctly configured PCIe P2P (and far more versus NVLink). nvidia-smi topo -m flags this with PHB paths between GPUs; on some PCIe-only nodes, the fallback is unavoidable and needs to be priced into your projections. Verify topology and set NCCL P2P behavior explicitly before launching distributed training; the deployment section below covers the exact commands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Align your pricing model to your usage pattern
&lt;/h2&gt;

&lt;p&gt;Picking the right instance only solves half the cost problem; the other half is how you pay for it, because demand fluctuates while capacity doesn’t, and most GPU deployments idle for long stretches at full per-hour rates. The fix is matching the pricing tier to the usage pattern, and Runpod’s three tiers correspond to three patterns most teams actually run.&lt;/p&gt;

&lt;p&gt;The first pattern is &lt;strong&gt;light or intermittent use&lt;/strong&gt;, which is where pay-as-you-go with per-second billing pays off. A 30-minute fine-tuning experiment billed per second costs materially less than the same run billed by the hour, and at ten experiments a day the delta compounds, so PAYG is the right default for experimentation and any workload running under four hours per day. Check &lt;a href="https://www.runpod.io/pricing" rel="noopener noreferrer"&gt;Runpod’s pricing page&lt;/a&gt; for current rates, since spot prices shift with capacity.&lt;/p&gt;

&lt;p&gt;Once usage crosses into &lt;strong&gt;sustained load&lt;/strong&gt; above roughly eight hours per day, that calculus inverts: per-second billing now charges premium rates on time the instance was going to be busy anyway. Reserved capacity is the answer for continuous training jobs or persistent inference endpoints, trading flexibility for meaningful per-hour savings and removing interruption risk from your critical path.&lt;/p&gt;

&lt;p&gt;The third pattern, &lt;strong&gt;bursty API traffic&lt;/strong&gt;, doesn’t fit either tier well: continuous reservation wastes budget at 3 am, and PAYG-per-second still pays for idle time between requests. &lt;a href="https://www.runpod.io/product/serverless" rel="noopener noreferrer"&gt;Serverless endpoints&lt;/a&gt; bill per request and scale to zero between them, so cost stays proportional to actual usage when traffic swings from 10,000 requests at launch to 200 overnight. The tradeoff is cold-start latency (60-180 seconds for a 70B model load), which is fine for batch APIs but requires a minimum worker count of one for user-facing endpoints; &lt;a href="https://www.runpod.io/blog/run-vllm-on-runpod-serverless" rel="noopener noreferrer"&gt;Runpod’s serverless vLLM guide&lt;/a&gt; covers the full deployment pattern.&lt;/p&gt;

&lt;p&gt;One lever cuts across all three tiers: &lt;strong&gt;quantization can change which instance class you’re paying for in the first place.&lt;/strong&gt; INT4 via bitsandbytes shrinks weight VRAM roughly 4x versus BF16, which is often enough to drop down a class, and the per-hour saving compounds across whichever pricing tier you’re on. Llama-3-70B in BF16 needs approximately 168GB and at least two H100 80 GB; at INT4, it fits a single A100 80GB at approximately 45-51GB. The catch is task sensitivity: generation and summarization typically see minimal accuracy loss from INT4, while reasoning, long-context retrieval, and code generation show measurable degradation, so verify by running 50-100 representative prompts side-by-side on BF16, and INT4 builds with &lt;a href="https://github.com/EleutherAI/lm-evaluation-harness" rel="noopener noreferrer"&gt;EleutherAI’s lm-evaluation-harness&lt;/a&gt; before you commit. &lt;a href="https://www.runpod.io/articles/guides/ai-model-quantization-reducing-memory-usage-without-sacrificing-performance" rel="noopener noreferrer"&gt;Runpod’s quantization guide&lt;/a&gt; covers the full quality tradeoff analysis.&lt;/p&gt;

&lt;p&gt;With a pricing model aligned to your usage pattern, the final step is deploying the container that translates your instance selection into a running endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy from container to serving endpoint
&lt;/h2&gt;

&lt;p&gt;Start with the base image, because a mismatched CUDA stack is the most common silent failure when a container moves between instance types. &lt;a href="https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch" rel="noopener noreferrer"&gt;NVIDIA’s NGC containers&lt;/a&gt; (e.g., nvcr.io/nvidia/pytorch:25.x-py3 at the latest stable tag) pin CUDA and cuDNN versions tested against specific GPU architectures, so pin the full image tag in your Dockerfile and test on the target instance class before pushing to production.&lt;/p&gt;

&lt;p&gt;With the base image fixed, the next choice is the serving framework. &lt;a href="https://docs.vllm.ai/en/latest/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; handles multi-GPU tensor-parallel inference, with PagedAttention allocating KV cache dynamically instead of reserving a worst-case slab up front. The --gpu-memory-utilization 0.90 flag caps the model executor at 90% of GPU memory (weights, activations, and KV cache blocks combined), leaving 10% free for framework overhead and preventing OOM at peak load.&lt;/p&gt;

&lt;p&gt;Here’s a minimal vLLM deployment for Llama-3.1-70B across four GPUs. Gated models require license acceptance on HuggingFace Hub and HF_TOKEN set in your environment (covered below).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; vllm.entrypoints.openai.api_server &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--model&lt;/span&gt; meta-llama/Meta-Llama-3.1-70B-Instruct &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.90 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 4096
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That starts a 4-GPU tensor-parallel server with an OpenAI-compatible API endpoint. Verify the HuggingFace model ID before deploying, since Meta updates names across Llama versions; &lt;a href="https://www.runpod.io/blog/optimize-vllm-deployments-runpod-guidellm" rel="noopener noreferrer"&gt;Runpod’s vLLM optimization guide&lt;/a&gt; covers workload-specific --gpu-memory-utilization tuning and GuideLLM throughput benchmarking.&lt;/p&gt;

&lt;p&gt;For distributed training instead of serving, &lt;a href="https://docs.ray.io/en/latest/train/train.html" rel="noopener noreferrer"&gt;Ray Train&lt;/a&gt; with a TorchTrainer handles worker discovery and process group initialization on Runpod’s elastic training clusters. ray.init(address="auto") connects to an existing cluster (head node + workers), which must already be running; provision one via Runpod’s cluster console and grab the head node address from the dashboard.&lt;/p&gt;

&lt;p&gt;On PCIe-only nodes, training also needs explicit NCCL P2P configuration before launch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check GPU topology -- NV2/NV3/NV4 indicates NVLink; PHB or SYS indicates PCIe paths&lt;/span&gt;
nvidia-smi topo &lt;span class="nt"&gt;-m&lt;/span&gt;

&lt;span class="c"&gt;# Launch with P2P enabled, and NCCL debug output active&lt;/span&gt;
&lt;span class="nv"&gt;NCCL_P2P_DISABLE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;NCCL_DEBUG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;INFO &lt;span class="se"&gt;\&lt;/span&gt;
  torchrun &lt;span class="nt"&gt;--nproc_per_node&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--nnodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
    train.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In NCCL_DEBUG output, “via NVL” confirms NVLink paths, “via P2P” means PCIe direct, and “via SYS” means CPU-mediated transfer (worst case for throughput).&lt;/p&gt;

&lt;p&gt;Credentials management is the same either way: inject HF_TOKEN, model registry credentials, and API keys as runtime environment variables, never baked into Docker layers (where they persist in image history across rebuilds and survive updates). Runpod’s console and SDK both support runtime env injection, which also makes rotation straightforward.&lt;/p&gt;

&lt;p&gt;Finally, verify the instance is actually earning its cost. Track VRAM with &lt;a href="https://docs.nvidia.com/deploy/nvidia-smi/index.html" rel="noopener noreferrer"&gt;nvidia-smi dmon -s u&lt;/a&gt; for per-second metrics, or &lt;a href="https://developer.nvidia.com/dcgm" rel="noopener noreferrer"&gt;DCGM&lt;/a&gt; for fleet-level monitoring with Prometheus. If a serving instance sits below 60% VRAM utilization at peak traffic, you’re over-provisioned: drop a class or raise the batch size to improve throughput per dollar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Put it all together in four steps
&lt;/h2&gt;

&lt;p&gt;Each of the four decisions above maps to one node in this decision tree:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmupyx1km91ob0dlctipp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmupyx1km91ob0dlctipp.png" alt="Decision Matrix" width="800" height="1502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To walk this path with your own model, start with the VRAM number. Open a Python shell with your model config loaded and run sum(p.numel() for p in model.parameters()) * 2 / 1e9 to get the BF16 weight size in gigabytes. Add 20% for framework overhead and KV cache at moderate sequence lengths, then cross-reference the VRAM table above to find the smallest Runpod instance that clears it.&lt;/p&gt;

&lt;p&gt;If you want to skip the base image setup entirely, Runpod Hub carries pre-built templates for vLLM, &lt;a href="https://github.com/axolotl-ai-cloud/axolotl" rel="noopener noreferrer"&gt;Axolotl&lt;/a&gt; (fine-tuning), and &lt;a href="https://www.comfy.org/" rel="noopener noreferrer"&gt;ComfyUI&lt;/a&gt; (diffusion) with CUDA, cuDNN, and library versions pre-configured for the target workload. A template gets you from VRAM calculation to a live inference endpoint in under 15 minutes. Validate your instance choice against real traffic before committing to reserved capacity.&lt;/p&gt;

&lt;p&gt;Pick your model, run the calculation, and &lt;a href="https://www.runpod.io/console" rel="noopener noreferrer"&gt;start building on Runpod&lt;/a&gt; with no waitlist and no sales call required.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloud</category>
      <category>infrastructure</category>
      <category>llm</category>
    </item>
    <item>
      <title>SQL database architecture, use cases, and monitoring: a practitioner's guide</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Wed, 22 Apr 2026 13:26:41 +0000</pubDate>
      <link>https://dev.to/damasosanoja/sql-database-architecture-use-cases-and-monitoring-a-practitioners-guide-16mk</link>
      <guid>https://dev.to/damasosanoja/sql-database-architecture-use-cases-and-monitoring-a-practitioners-guide-16mk</guid>
      <description>&lt;p&gt;Most SQL performance problems trace back to a handful of knobs, a handful of metrics, and the architecture that connects them. This guide covers all three across PostgreSQL, MySQL InnoDB, and SQL Server, starting with the cheat sheet you can act on today and working backward through the justification for every number in it.&lt;/p&gt;

&lt;p&gt;If you are setting up a new SQL deployment or auditing one you inherited, the next two tables are the answer. Screenshot them, calibrate the numbers against your own baseline (next section), and read on for the architecture that explains why each number sits where it does.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tuning cheat sheet
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Knob&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;th&gt;MySQL (InnoDB)&lt;/th&gt;
&lt;th&gt;SQL Server&lt;/th&gt;
&lt;th&gt;Starting point&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Buffer pool size&lt;/td&gt;
&lt;td&gt;&lt;code&gt;shared_buffers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;innodb_buffer_pool_size&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max server memory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL: 25% of host RAM on a dedicated database host, diminishing returns above 8-10 GB unless host has &amp;gt;32 GB RAM. MySQL: 70-80% of host RAM on a dedicated host. SQL Server: set &lt;code&gt;max server memory&lt;/code&gt; leaving ~10-15% of host RAM for the OS.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Planner cache hint&lt;/td&gt;
&lt;td&gt;&lt;a href="https://postgresqlco.nf/doc/en/param/effective_cache_size/" rel="noopener noreferrer"&gt;&lt;code&gt;effective_cache_size&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;50-75% of host RAM; update alongside &lt;code&gt;shared_buffers&lt;/code&gt; so the planner accounts for OS page cache&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit durability&lt;/td&gt;
&lt;td&gt;&lt;code&gt;synchronous_commit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;innodb_flush_log_at_trx_commit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;(always on)&lt;/td&gt;
&lt;td&gt;Leave strict for financial data. Relax to &lt;code&gt;off&lt;/code&gt; on PostgreSQL (up to ~200 ms crash-loss window, bounded by &lt;code&gt;wal_writer_delay&lt;/code&gt;) or &lt;code&gt;2&lt;/code&gt; on MySQL (up to ~1 second crash-loss window, bounded by the once-per-second log flush) on event logs and session stores.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autovacuum aggressiveness&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.postgresql.org/docs/17/runtime-config-autovacuum.html" rel="noopener noreferrer"&gt;&lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;purge thread (tuned via &lt;code&gt;innodb_purge_batch_size&lt;/code&gt;, &lt;code&gt;innodb_purge_threads&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Drop PG from the 0.2 default to 0.01-0.05 on any table receiving millions of updates per day; apply per-table (see §4.2) rather than globally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection ceiling&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max_connections&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max_connections&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option" rel="noopener noreferrer"&gt;&lt;code&gt;max worker threads&lt;/code&gt;&lt;/a&gt; (default 0 = auto)&lt;/td&gt;
&lt;td&gt;Size so (app pool size) × (app servers) × 1.2 stays under the ceiling; add pooler if math doesn't close. SQL Server has no &lt;code&gt;max_connections&lt;/code&gt; analog; for finer control use workload groups under Resource Governor.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snapshot isolation&lt;/td&gt;
&lt;td&gt;(on by default via MVCC)&lt;/td&gt;
&lt;td&gt;(on by default via MVCC)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://learn.microsoft.com/en-us/answers/questions/235815/sql-server-read-committed-snapshot-isolation" rel="noopener noreferrer"&gt;&lt;code&gt;ALTER DATABASE ... SET READ_COMMITTED_SNAPSHOT ON&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Enable RCSI on SQL Server databases carrying mixed OLTP and reporting, and budget for tempdb write pressure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The alerting cheat sheet
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Threshold that should page you&lt;/th&gt;
&lt;th&gt;What it is actually telling you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query p95 execution time, per query pattern&lt;/td&gt;
&lt;td&gt;2× the pattern's own two-week baseline&lt;/td&gt;
&lt;td&gt;A plan regression, stale statistics, or a new callsite running without an index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Buffer cache hit ratio&lt;/td&gt;
&lt;td&gt;Below 95% sustained on OLTP; below 99% on hot-data-heavy PG deployments; OLAP workloads may tolerate lower&lt;/td&gt;
&lt;td&gt;Working set exceeds buffer pool, or a cold cache after restart, or a sequential scan that should not be happening&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deadlock count&lt;/td&gt;
&lt;td&gt;Any non-zero count in a 5-minute window&lt;/td&gt;
&lt;td&gt;Lock ordering inconsistency in application code, not a database bug&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lock wait count&lt;/td&gt;
&lt;td&gt;Rising trend, not an absolute number&lt;/td&gt;
&lt;td&gt;A long transaction holding row locks against OLTP traffic; usually surfaces upstream as HTTP 503s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection usage&lt;/td&gt;
&lt;td&gt;Sustained above 80% of the connection ceiling (75-90% range is defensible depending on risk tolerance)&lt;/td&gt;
&lt;td&gt;Pooling is undersized, missing, or the app is leaking connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replication lag&lt;/td&gt;
&lt;td&gt;Above your RPO target, not a universal number&lt;/td&gt;
&lt;td&gt;WAL sender saturation, slow replica consumer, network, or a long-running query on the replica&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit latency&lt;/td&gt;
&lt;td&gt;Above 10 ms on NVMe, 50 ms on SATA SSD&lt;/td&gt;
&lt;td&gt;fsync contention on the log volume, usually because data and log share the same disk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Baselining: capturing a fingerprint before the first incident
&lt;/h2&gt;

&lt;p&gt;Every number in the cheat sheet is a starting point, not a verdict. A 95% cache hit ratio is healthy on one workload and a disaster on another. The only way to know which side your deployment sits on is to capture a fingerprint before production traffic arrives, so that when it does, you have something to compare against.&lt;/p&gt;

&lt;p&gt;Synthetic load is the entry point. On PostgreSQL, &lt;a href="https://www.postgresql.org/docs/current/pgbench.html" rel="noopener noreferrer"&gt;&lt;code&gt;pgbench&lt;/code&gt;&lt;/a&gt; ships in the contrib package and runs a TPC-B-like workload out of the box. &lt;a href="https://www.cybertec-postgresql.com/en/a-formula-to-calculate-pgbench-scaling-factor-for-target-db-size/" rel="noopener noreferrer"&gt;&lt;code&gt;pgbench -i -s 50&lt;/code&gt;&lt;/a&gt; creates a dataset large enough that the working set pushes buffer pool behavior into realistic territory, and &lt;code&gt;pgbench -c 20 -j 4 -T 600&lt;/code&gt; drives it for ten minutes. Its final output gives you &lt;code&gt;tps&lt;/code&gt;, &lt;code&gt;latency average&lt;/code&gt;, and (with &lt;code&gt;--report-per-command&lt;/code&gt;) per-statement latency; the &lt;code&gt;latency average&lt;/code&gt; line and stddev map directly to the p95 query time fingerprint. On MySQL, &lt;a href="https://severalnines.com/blog/how-benchmark-performance-mysql-mariadb-using-sysbench/" rel="noopener noreferrer"&gt;&lt;code&gt;sysbench&lt;/code&gt;&lt;/a&gt; plays the same role, and its &lt;code&gt;oltp_read_write&lt;/code&gt; profile is a reasonable first cut. Read the &lt;code&gt;95th percentile&lt;/code&gt; line under &lt;code&gt;Latency (ms)&lt;/code&gt; in the summary. Tool setup is covered in vendor documentation; the signal you should capture once it's running is what matters here. Neither tool replaces an application-shaped load test, but they produce enough signal to detect whether your configuration is sane before real users expose the places it is not.&lt;/p&gt;

&lt;p&gt;A minimum viable baseline fingerprint covers six numbers, captured over a window long enough to include your full cycle of cron jobs and batch work (two weeks is the usable lower bound for p95 query time):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p95 query execution time per significant query pattern&lt;/li&gt;
&lt;li&gt;buffer cache hit ratio, on average and at its worst fifteen-minute window&lt;/li&gt;
&lt;li&gt;WAL or redo log write rate in bytes per second&lt;/li&gt;
&lt;li&gt;lock wait count per hour&lt;/li&gt;
&lt;li&gt;deadlock count per day (you are hoping for zero)&lt;/li&gt;
&lt;li&gt;replication lag peak, if you have replicas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Record each one, note the day and hour of its worst value, and keep the file somewhere your on-call rotation can find it. Every alert threshold in the cheat sheet becomes defensible once you can say "yes, we crossed 2× our own baseline" rather than "yes, we crossed a number we read on the internet."&lt;/p&gt;

&lt;p&gt;With a baseline in hand, the rest of the article explains why each number sits where it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why those numbers: the four components that dictate them
&lt;/h2&gt;

&lt;p&gt;Every SQL database, whether PostgreSQL 17+, MySQL 8.0, or SQL Server 2022, shares four components that each drive a specific row in the cheat sheet. The &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide" rel="noopener noreferrer"&gt;query processor&lt;/a&gt;&lt;/strong&gt; parses, plans, and executes queries. The &lt;strong&gt;storage engine&lt;/strong&gt; handles physical reads and writes. The &lt;strong&gt;transaction log&lt;/strong&gt; (&lt;a href="https://www.postgresql.org/docs/current/wal-intro.html" rel="noopener noreferrer"&gt;WAL&lt;/a&gt; in PostgreSQL, redo &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/innodb-redo-log.html" rel="noopener noreferrer"&gt;log in MySQL&lt;/a&gt; InnoDB) persists changes before commit. The &lt;strong&gt;buffer pool&lt;/strong&gt; caches data pages in memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  The query processor and the stale-statistics failure mode
&lt;/h3&gt;

&lt;p&gt;The query processor drives the "query p95 2× baseline" alert. Its &lt;a href="https://aws.amazon.com/blogs/database/determining-the-optimal-value-for-shared_buffers-using-the-pg_buffercache-extension-in-postgresql/" rel="noopener noreferrer"&gt;optimizer chooses a plan based on table statistics&lt;/a&gt;, and those statistics go stale the moment a batch load changes the row count without triggering a stats refresh. A table with 10 million rows whose stored statistics still claim 500,000 gets a full sequential scan where an index seek would have sufficed, and execution cost multiplies by orders of magnitude. What the monitoring dashboard shows is latency; what the database is doing is reading the entire heap.&lt;/p&gt;

&lt;p&gt;This is why a post-deployment p95 spike is worth checking for statistics invalidation before other root causes: a schema migration or large insert is a common statistics-invalidation event in a team's weekly rhythm.&lt;/p&gt;

&lt;h3&gt;
  
  
  The buffer pool and the hit-ratio threshold
&lt;/h3&gt;

&lt;p&gt;Sized memory is what separates a database that answers in milliseconds from one that answers in seconds, and the cache hit ratio alert is measuring exactly that. PostgreSQL's &lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.html" rel="noopener noreferrer"&gt;&lt;code&gt;shared_buffers&lt;/code&gt;&lt;/a&gt; defaults to 128 MB, which is adequate for a laptop and absurd on a host with 50 GB of hot data. MySQL InnoDB's &lt;code&gt;innodb_buffer_pool_size&lt;/code&gt; defaults to 128 MB for the same historical reason, though it &lt;a href="https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-resize.html" rel="noopener noreferrer"&gt;resizes dynamically since MySQL 5.7&lt;/a&gt;. &lt;a href="https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/server-memory-server-configuration-options?view=sql-server-ver17" rel="noopener noreferrer"&gt;SQL Server sizes its buffer pool automatically&lt;/a&gt; under the &lt;code&gt;max server memory&lt;/code&gt; ceiling.&lt;/p&gt;

&lt;p&gt;Every cache miss under an undersized buffer pool becomes a disk read, and the cost depends entirely on the medium:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Medium&lt;/th&gt;
&lt;th&gt;Read latency&lt;/th&gt;
&lt;th&gt;Penalty vs. RAM (sub-1 µs)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NVMe SSD&lt;/td&gt;
&lt;td&gt;~25 µs&lt;/td&gt;
&lt;td&gt;~25×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SATA SSD&lt;/td&gt;
&lt;td&gt;100–200 µs&lt;/td&gt;
&lt;td&gt;100–200×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15,000 RPM enterprise HDD&lt;/td&gt;
&lt;td&gt;4,000–6,000 µs (4–6 ms)&lt;/td&gt;
&lt;td&gt;4,000–6,000×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7,200 RPM consumer HDD&lt;/td&gt;
&lt;td&gt;10,000–15,000 µs (10–15 ms)&lt;/td&gt;
&lt;td&gt;10,000–15,000×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At 10,000 queries per second, the difference between a 97% and an 87% hit ratio is the difference between a healthy database and a queue of backed-up requests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://postgresqlco.nf/doc/en/param/shared_buffers/" rel="noopener noreferrer"&gt;Changing &lt;code&gt;shared_buffers&lt;/code&gt; requires a PostgreSQL restart&lt;/a&gt;, and you should update &lt;code&gt;effective_cache_size&lt;/code&gt; at the same time so the planner accounts for the OS page cache on top of the buffer pool. Above roughly 8-10 GB the marginal return drops, so throwing RAM at the problem past that point is not the fix it looks like.&lt;/p&gt;

&lt;h3&gt;
  
  
  The transaction log and fsync latency
&lt;/h3&gt;

&lt;p&gt;Every major engine writes durability records before acknowledging a commit, and that &lt;a href="https://www.postgresql.org/docs/8.1/runtime-config-wal.html" rel="noopener noreferrer"&gt;fsync&lt;/a&gt; cost is what the commit latency alert is measuring. If the log volume sits on the same disk as the data volume, and that disk is under I/O pressure from buffer pool flushes or cache-miss reads, transaction commits queue behind every other operation on the disk and commit times spike while the query execution clock looks fine.&lt;/p&gt;

&lt;p&gt;The rule is boringly mechanical: put the log on its own volume, or verify that your cloud storage class gives the log volume headroom independent of the data volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pipeline end-to-end
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firvzingnujm5r7cxfwp5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firvzingnujm5r7cxfwp5.png" alt="Pipeline end-to-end" width="800" height="1397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The four components do not fail independently. A stale-statistics problem generates a sequential scan, which blows through the buffer pool, which triggers disk I/O that contends with transaction log writes, which inflates commit latency. One regression, four cheat-sheet rows lit up at once. This component-to-component cascade is why concurrency, the other layer of runtime behavior, is the next piece of the justification.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrency: the second layer of the 'why'
&lt;/h2&gt;

&lt;p&gt;The cheat sheet's deadlock, lock wait, and autovacuum thresholds all come from how the database enforces isolation and durability under concurrent load.&lt;/p&gt;

&lt;h3&gt;
  
  
  What ACID actually costs
&lt;/h3&gt;

&lt;p&gt;Atomicity pays for rollback capability with WAL writes on every transaction. Durability pays for crash safety with an &lt;a href="https://www.postgresql.org/docs/current/wal-async-commit.html" rel="noopener noreferrer"&gt;fsync on commit&lt;/a&gt;, which is the reason the PostgreSQL &lt;code&gt;synchronous_commit = off&lt;/code&gt; row in the cheat sheet exists. With async commit, writes return to the application before the fsync completes, and the exposure window on a crash is bounded by &lt;a href="https://postgresqlco.nf/doc/en/param/wal_writer_delay/" rel="noopener noreferrer"&gt;&lt;code&gt;wal_writer_delay&lt;/code&gt;&lt;/a&gt; (default 200 ms). For event logs and session stores that is fine; for financial records it is not. MySQL exposes an equivalent lever through &lt;a href="https://docs.netapp.com/us-en/ontap-apps-dbs/mysql/mysql-innodb_flush_log_at_trx_commit.html" rel="noopener noreferrer"&gt;&lt;code&gt;innodb_flush_log_at_trx_commit = 2&lt;/code&gt;&lt;/a&gt;, which flushes to the OS buffer once per second rather than on every commit and carries a crash-loss window of up to ~1 second.&lt;/p&gt;

&lt;p&gt;Isolation pays in one of two currencies: lock contention or MVCC bookkeeping. You do not get to opt out of both.&lt;/p&gt;

&lt;h3&gt;
  
  
  MVCC and the dead tuple tax
&lt;/h3&gt;

&lt;p&gt;PostgreSQL and MySQL InnoDB both use &lt;a href="https://en.wikipedia.org/wiki/Multiversion_concurrency_control" rel="noopener noreferrer"&gt;Multi-Version Concurrency Control&lt;/a&gt;. Readers get a consistent snapshot as of their transaction start; writers create new row versions rather than overwriting in place. The side effect, and the reason autovacuum is on the cheat sheet, is dead tuple accumulation. Every &lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt; leaves an old row version behind, and that version stays reachable until no active snapshot still references it.&lt;/p&gt;

&lt;p&gt;The default &lt;a href="https://postgresqlco.nf/doc/en/param/autovacuum_vacuum_scale_factor/" rel="noopener noreferrer"&gt;&lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt;&lt;/a&gt; of 0.2 waits until 20% of a table has changed before vacuuming runs. On a table receiving millions of updates per day, 20% is a long time, and bloat pushes sequential scan cost upward while evicting live pages from the buffer pool (which is how the cache hit ratio row and the autovacuum row on the cheat sheet are really the same row, seen from two angles). The trigger is actually &lt;code&gt;autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor × reltuples)&lt;/code&gt;. On a 20-million-row table the 50-row default threshold is irrelevant, but on a 5,000-row lookup table the threshold dominates and should be scaled down proportionally.&lt;/p&gt;

&lt;p&gt;In production, apply the aggressive scale factor per-table rather than globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;high_churn_table&lt;/span&gt;
  &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autovacuum_vacuum_scale_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;autovacuum_vacuum_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This avoids triggering frequent vacuums on small or rarely-updated tables that a global change would also hit. MySQL handles the same problem through its &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html" rel="noopener noreferrer"&gt;purge thread&lt;/a&gt;, and a growing &lt;a href="https://www.solarwinds.com/blog/what-is-innodb-history-list-length" rel="noopener noreferrer"&gt;"History list length"&lt;/a&gt; in &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt; is the canary that purging is falling behind.&lt;/p&gt;

&lt;p&gt;SQL Server defaults to &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver16" rel="noopener noreferrer"&gt;pessimistic row-level locking under &lt;code&gt;READ COMMITTED&lt;/code&gt;&lt;/a&gt;, which means readers and writers compete for the same locks on the same rows. &lt;a href="https://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/" rel="noopener noreferrer"&gt;Read Committed Snapshot Isolation&lt;/a&gt; swaps this for a version-store model closer to PostgreSQL's MVCC, and on databases carrying mixed OLTP and reporting traffic it typically cuts reader-writer lock wait counts visibly, at the cost of additional tempdb write pressure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading a deadlock graph
&lt;/h3&gt;

&lt;p&gt;The deadlock row on the cheat sheet ("any non-zero count should page you") is defensible only if you know what to do with the graph when it fires. The classic two-transaction cycle looks like this when MySQL's &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-detection.html" rel="noopener noreferrer"&gt;&lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt;&lt;/a&gt; reports it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;------------------------&lt;/span&gt;
&lt;span class="n"&gt;LATEST&lt;/span&gt; &lt;span class="n"&gt;DETECTED&lt;/span&gt; &lt;span class="n"&gt;DEADLOCK&lt;/span&gt;
&lt;span class="c1"&gt;------------------------&lt;/span&gt;
&lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;TRANSACTION&lt;/span&gt; &lt;span class="mi"&gt;4212&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ACTIVE&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="n"&gt;sec&lt;/span&gt; &lt;span class="n"&gt;starting&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;read&lt;/span&gt;
&lt;span class="n"&gt;mysql&lt;/span&gt; &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;LOCK&lt;/span&gt; &lt;span class="n"&gt;WAIT&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;lock&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;heap&lt;/span&gt; &lt;span class="k"&gt;size&lt;/span&gt; &lt;span class="mi"&gt;1136&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;row&lt;/span&gt; &lt;span class="k"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;MySQL&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;112&lt;/span&gt; &lt;span class="n"&gt;localhost&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="n"&gt;updating&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;WAITING&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;THIS&lt;/span&gt; &lt;span class="k"&gt;LOCK&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;BE&lt;/span&gt; &lt;span class="k"&gt;GRANTED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;RECORD&lt;/span&gt; &lt;span class="n"&gt;LOCKS&lt;/span&gt; &lt;span class="k"&gt;space&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="mi"&gt;72&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="nv"&gt;`shop`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;`accounts`&lt;/span&gt;
&lt;span class="n"&gt;trx&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;4212&lt;/span&gt; &lt;span class="n"&gt;lock_mode&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="n"&gt;locks&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="n"&gt;gap&lt;/span&gt; &lt;span class="n"&gt;waiting&lt;/span&gt;

&lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;TRANSACTION&lt;/span&gt; &lt;span class="mi"&gt;4213&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ACTIVE&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="n"&gt;sec&lt;/span&gt; &lt;span class="n"&gt;starting&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;read&lt;/span&gt;
&lt;span class="n"&gt;mysql&lt;/span&gt; &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;lock&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;heap&lt;/span&gt; &lt;span class="k"&gt;size&lt;/span&gt; &lt;span class="mi"&gt;1136&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;row&lt;/span&gt; &lt;span class="k"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;MySQL&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;113&lt;/span&gt; &lt;span class="n"&gt;localhost&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="n"&gt;updating&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;HOLDS&lt;/span&gt; &lt;span class="n"&gt;THE&lt;/span&gt; &lt;span class="k"&gt;LOCK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="n"&gt;RECORD&lt;/span&gt; &lt;span class="n"&gt;LOCKS&lt;/span&gt; &lt;span class="k"&gt;space&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="mi"&gt;72&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="nv"&gt;`shop`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;`accounts`&lt;/span&gt;
&lt;span class="n"&gt;trx&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;4213&lt;/span&gt; &lt;span class="n"&gt;lock_mode&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="n"&gt;locks&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="n"&gt;gap&lt;/span&gt;

&lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;WAITING&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;THIS&lt;/span&gt; &lt;span class="k"&gt;LOCK&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;BE&lt;/span&gt; &lt;span class="k"&gt;GRANTED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;RECORD&lt;/span&gt; &lt;span class="n"&gt;LOCKS&lt;/span&gt; &lt;span class="k"&gt;space&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="mi"&gt;72&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="nv"&gt;`shop`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;`accounts`&lt;/span&gt;
&lt;span class="n"&gt;trx&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="mi"&gt;4213&lt;/span&gt; &lt;span class="n"&gt;lock_mode&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="n"&gt;locks&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="n"&gt;gap&lt;/span&gt; &lt;span class="n"&gt;waiting&lt;/span&gt;

&lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="n"&gt;WE&lt;/span&gt; &lt;span class="n"&gt;ROLL&lt;/span&gt; &lt;span class="n"&gt;BACK&lt;/span&gt; &lt;span class="n"&gt;TRANSACTION&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the graph in four steps. First, confirm both transactions touch the same table and index (both rows above live in &lt;code&gt;PRIMARY of table 'shop'.'accounts'&lt;/code&gt;). Second, identify which rows each transaction already holds and which it is waiting for (transaction 2 holds &lt;code&gt;id = 1&lt;/code&gt; and wants &lt;code&gt;id = 2&lt;/code&gt;; transaction 1 is the mirror). Third, note the &lt;code&gt;query id&lt;/code&gt; of each waiter and walk it back through the application logs to find the callsite. Fourth, look at the order the rows are touched: one transaction updates &lt;code&gt;id = 2&lt;/code&gt; first, the other updates &lt;code&gt;id = 1&lt;/code&gt; first, and the inconsistent ordering is the actual bug. The &lt;a href="https://dev.to/techschoolguru/how-to-avoid-deadlock-in-db-transaction-queries-order-matter-oh7"&gt;fix is application-side&lt;/a&gt;, usually sorting locked keys before the transaction opens so that every caller acquires them in the same order.&lt;/p&gt;

&lt;p&gt;PostgreSQL does not print a waits-for graph; it logs each deadlock as a &lt;code&gt;ERROR: deadlock detected&lt;/code&gt; line with a &lt;code&gt;DETAIL&lt;/code&gt; block per process, provided you have &lt;a href="https://www.postgresql.org/docs/current/runtime-config-logging.html" rel="noopener noreferrer"&gt;&lt;code&gt;log_lock_waits = on&lt;/code&gt;&lt;/a&gt; and &lt;code&gt;deadlock_timeout&lt;/code&gt; set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;ERROR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;deadlock&lt;/span&gt; &lt;span class="n"&gt;detected&lt;/span&gt;
&lt;span class="n"&gt;DETAIL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;Process&lt;/span&gt; &lt;span class="mi"&gt;18422&lt;/span&gt; &lt;span class="n"&gt;waits&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ShareLock&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;transaction&lt;/span&gt; &lt;span class="mi"&gt;9911&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;blocked&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="mi"&gt;18423&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
         &lt;span class="n"&gt;Process&lt;/span&gt; &lt;span class="mi"&gt;18423&lt;/span&gt; &lt;span class="n"&gt;waits&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ShareLock&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;transaction&lt;/span&gt; &lt;span class="mi"&gt;9912&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;blocked&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="mi"&gt;18422&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
         &lt;span class="n"&gt;Process&lt;/span&gt; &lt;span class="mi"&gt;18422&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
         &lt;span class="n"&gt;Process&lt;/span&gt; &lt;span class="mi"&gt;18423&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;HINT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;See&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same four-step read applies: same table (extract from the &lt;code&gt;UPDATE&lt;/code&gt; fragments in &lt;code&gt;DETAIL&lt;/code&gt;), held vs. waiting (each &lt;code&gt;ShareLock on transaction&lt;/code&gt; line names the blocking PID), callsite (cross-reference the PID against &lt;code&gt;pg_stat_activity&lt;/code&gt; at the time of the error), and access order (compare the row IDs across the two UPDATEs). Each PostgreSQL deadlock entry is complete but separate per process; on InnoDB the monitor only reports the most recent cycle.&lt;/p&gt;

&lt;p&gt;The database is not broken. It detected the cycle, killed the cheaper transaction, and returned &lt;code&gt;ERROR 1213: Deadlock found when trying to get lock; try restarting transaction&lt;/code&gt; (InnoDB) or a &lt;code&gt;40P01&lt;/code&gt; SQLSTATE (PostgreSQL). Teams often spend days debugging application logic when the answer is a two-line change in the function that opens the transaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long transactions as a connection-pool killer
&lt;/h3&gt;

&lt;p&gt;A batch job that opens a transaction, processes 50,000 rows, and holds row-level locks for 90 seconds blocks concurrent OLTP writes against those same rows for the entire duration. Those writes do not fail. &lt;a href="https://severalnines.com/blog/how-fix-lock-wait-timeout-exceeded-error-mysql/" rel="noopener noreferrer"&gt;They queue behind the lock wait timeout&lt;/a&gt;, and while they queue, the connections they hold fill the application pool. A common first visible symptom is HTTP 503s at the load balancer, and the database-side lock wait often does not surface as an explicit error in the application logs. This is why the cheat sheet treats lock wait count as a rising-trend alert rather than a single-number threshold: the database is patient, and the pool dies first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replication topology and lag as a first-class metric
&lt;/h2&gt;

&lt;p&gt;Replication was a footnote in most database guides a decade ago. It is now the way you isolate analytics from OLTP, and replication lag is the second-fastest alert category to matter in managed environments, behind only query latency. Before you can reason about what lag signals, the topology itself has to justify its place in the runtime model, so start there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read replicas and materialized views for analytics isolation
&lt;/h3&gt;

&lt;p&gt;Analytics queries create the opposite pressure from OLTP. A &lt;code&gt;GROUP BY&lt;/code&gt; over 200 million rows, or a three-way join against a fact table, produces a plan that runs for minutes on OLTP-class hardware and scans so many pages that it evicts everything else from the buffer pool. Running that kind of query against the primary is how you destroy the cache hit ratio for every other workload at once.&lt;/p&gt;

&lt;p&gt;Two architectural answers, used together more often than apart. A read replica takes the analytics traffic off the primary entirely; the primary's buffer pool stays warm with its real working set, and the replica can have its own planner settings tuned for long scans. A materialized view precomputes the aggregation so the analytics query reads kilobytes instead of gigabytes, and &lt;a href="https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html" rel="noopener noreferrer"&gt;PostgreSQL's &lt;code&gt;REFRESH MATERIALIZED VIEW CONCURRENTLY&lt;/code&gt;&lt;/a&gt; lets the refresh run without blocking concurrent reads on the view (though it does require a unique index on the view, and will error loudly if one is missing).&lt;/p&gt;

&lt;h3&gt;
  
  
  Replication lag as its own alert category
&lt;/h3&gt;

&lt;p&gt;Once you have replicas, lag is a metric in its own right. &lt;a href="https://www.pgedge.com/blog/understanding-and-reducing-postgresql-replication-lag" rel="noopener noreferrer"&gt;The cheat sheet leaves the threshold blank on purpose&lt;/a&gt;: a 10-second lag is fine on an analytics replica and catastrophic on a read-your-writes OLTP replica, so the number is whatever your RPO says it is.&lt;/p&gt;

&lt;p&gt;On PostgreSQL, the diagnostic query is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;application_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;client_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;sync_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;write_lag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;flush_lag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;replay_lag&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_replication&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three &lt;code&gt;_lag&lt;/code&gt; columns (available since PostgreSQL 10) return intervals, so the output reads directly as time and maps straight to RPO-based alert thresholds. &lt;a href="https://postgres.ai/docs/postgres-howtos/advanced-topics/replication/how-to-troubleshoot-streaming-replication-lag" rel="noopener noreferrer"&gt;The three columns separate the causes&lt;/a&gt;. A high &lt;code&gt;flush_lag&lt;/code&gt; points at slow replica disk I/O. A high &lt;code&gt;write_lag&lt;/code&gt; with a healthy &lt;code&gt;flush_lag&lt;/code&gt; more often indicates WAL receiver CPU saturation or a network socket issue on the replica side, not disks. A high &lt;a href="https://www.cybertec-postgresql.com/en/streaming-replication-conflicts-in-postgresql/" rel="noopener noreferrer"&gt;&lt;code&gt;replay_lag&lt;/code&gt; with healthy write and flush&lt;/a&gt; usually means a long-running query on the replica is blocking WAL replay (PostgreSQL applies WAL on a single process, and a conflicting reader can hold it off).&lt;/p&gt;

&lt;p&gt;When you need to pinpoint the bottleneck at byte granularity (for example, to estimate how many WAL segments a replica is behind), use &lt;code&gt;pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)&lt;/code&gt; from the same view.&lt;/p&gt;

&lt;p&gt;On MySQL, &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/show-replica-status.html" rel="noopener noreferrer"&gt;&lt;code&gt;SHOW REPLICA STATUS&lt;/code&gt;&lt;/a&gt; gives you &lt;code&gt;Seconds_Behind_Source&lt;/code&gt;, which is a reasonable first-cut metric with two failure modes to know. First, it returns &lt;code&gt;NULL&lt;/code&gt; when the I/O thread is disconnected, so a disconnected replica shows no lag rather than infinite lag, and an alerting rule that pages on high values only will miss the outage entirely. Second, with GTID-based replication the value can understate real lag when the replica executes transactions out of commit-timestamp order. For anything beyond the first cut, compare &lt;code&gt;GTID_SUBTRACT(@@GLOBAL.gtid_executed, Executed_Gtid_Set)&lt;/code&gt; between the source and the replica, or diff binlog positions directly.&lt;/p&gt;

&lt;p&gt;Rising lag is almost never a database bug. It is usually WAL sender saturation, a slow replica consumer, a network event, or a long-running replica query, in that order of likelihood.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managed services: what you can and cannot tune
&lt;/h2&gt;

&lt;p&gt;Every row of the opening cheat sheet assumes you can actually change the knob. On RDS, Cloud SQL, and Azure SQL, several of them are gated, and a few are gone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parameter group lockouts
&lt;/h3&gt;

&lt;p&gt;Managed services expose their &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Parameters.html" rel="noopener noreferrer"&gt;tuning surface through parameter groups&lt;/a&gt; (RDS, Cloud SQL) or database-scoped configuration (Azure SQL). The surface overlaps heavily with a self-managed deployment but is not identical: some parameters are dynamic and changeable at any time, some are static and require an instance reboot, and some are marked read-only and cannot be changed at all regardless of permissions.&lt;/p&gt;

&lt;p&gt;On &lt;a href="https://repost.aws/knowledge-center/rds-aurora-postgresql-shared-buffers" rel="noopener noreferrer"&gt;RDS PostgreSQL, &lt;code&gt;shared_buffers&lt;/code&gt;, &lt;code&gt;effective_cache_size&lt;/code&gt;, and &lt;code&gt;work_mem&lt;/code&gt;&lt;/a&gt; are available but require a parameter group change and, for the first one, a reboot. &lt;a href="https://postgresqlco.nf/doc/en/param/wal_level/" rel="noopener noreferrer"&gt;&lt;code&gt;wal_level&lt;/code&gt;&lt;/a&gt; is a static parameter: standard PostgreSQL only reads it at server startup, and in RDS it is controlled indirectly via the static &lt;code&gt;rds.logical_replication&lt;/code&gt; parameter, which &lt;a href="https://repost.aws/questions/QU6iggpgIPQQKY8l171wIXEw" rel="noopener noreferrer"&gt;also requires an instance reboot&lt;/a&gt;. Changing it has cascading effects on replication topology. A handful of parameters that exist in self-managed PostgreSQL &lt;a href="https://repost.aws/questions/QUdunymDS7TDCHazwVrcQztQ/amazon-rds-for-postgresql-configuration-differences-with-postgresql-on-ec2" rel="noopener noreferrer"&gt;are not exposed at all&lt;/a&gt;; verify any cheat-sheet row against your parameter group before committing to a remediation plan in an incident.&lt;/p&gt;

&lt;p&gt;On Azure SQL, &lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tiers-dtu?view=azuresql" rel="noopener noreferrer"&gt;the DTU model&lt;/a&gt; hides the concept of individual tuning knobs entirely in favor of a blended performance tier, while &lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tiers-sql-database-vcore?view=azuresql" rel="noopener noreferrer"&gt;the vCore model&lt;/a&gt; exposes more traditional sizing levers. If you inherited a DTU-model database and the cheat sheet tells you to resize the buffer pool, the answer is "move to vCore or resize the tier."&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection budgets and pooling
&lt;/h3&gt;

&lt;p&gt;Managed services cap &lt;code&gt;max_connections&lt;/code&gt; based on the instance class memory. An &lt;a href="https://medium.com/@bbakla/understanding-maximum-number-of-database-connections-on-aws-rds-c9a666d205e1" rel="noopener noreferrer"&gt;RDS &lt;code&gt;db.t3.medium&lt;/code&gt; (4 GB RAM) lands around 450 connections&lt;/a&gt;, following a memory-derived formula that effectively divides available memory by a per-connection overhead constant. If your application opens pools of 50 threads per app server and you run 10 app servers, you have consumed the entire connection budget with a single tier before any batch jobs or admin sessions show up. The cheat sheet's "keep usage under 80%" row assumes you did this math during deployment; on managed services, that math is not optional.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.pgbouncer.org/features.html" rel="noopener noreferrer"&gt;PgBouncer&lt;/a&gt; or RDS Proxy sits between the app and the database and multiplexes connections so the backend count stays flat while the client count grows. Use transaction pooling mode rather than session pooling mode; session pooling holds a connection for the entire client session and saves nothing worth having. The trade-off used to be that transaction mode broke server-side prepared statements, forcing applications that relied on &lt;code&gt;PREPARE&lt;/code&gt;/&lt;code&gt;EXECUTE&lt;/code&gt; to move preparation client-side or accept session mode's ceiling. &lt;a href="https://pganalyze.com/blog/5mins-postgres-pgbouncer-prepared-statements-transaction-mode" rel="noopener noreferrer"&gt;PgBouncer 1.21+ on PostgreSQL 14+ added protocol-level support for named prepared statements in transaction mode&lt;/a&gt; (via &lt;code&gt;max_prepared_statements&lt;/code&gt;), removing that constraint for teams on current versions. On older PgBouncer or pre-PG14, the trade-off still stands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage tiers and IOPS cliffs
&lt;/h3&gt;

&lt;p&gt;The buffer cache hit ratio row on the cheat sheet assumes the working set either fits in RAM or falls back to consistently-fast storage. On RDS, "consistently fast" is a storage class, not a given. &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html" rel="noopener noreferrer"&gt;&lt;code&gt;gp3&lt;/code&gt; volumes&lt;/a&gt; provide a baseline 3,000 IOPS for volumes under 400 GB and &lt;a href="https://ivancasco.com/blog/boost-your-aws-rds-performance-by-crossing-the-400gb-hidden-threshold/" rel="noopener noreferrer"&gt;12,000 IOPS above that threshold&lt;/a&gt; for most database engines, with provisioned IOPS decoupled from storage size. &lt;a href="https://aws.amazon.com/blogs/database/optimize-amazon-rds-performance-with-io2-block-express-storage-for-production-workloads/" rel="noopener noreferrer"&gt;&lt;code&gt;io2&lt;/code&gt; volumes&lt;/a&gt; provide provisioned IOPS contractually and are the right choice when your cache miss rate is high enough that fallback-to-storage is a hot path rather than a rare event. The older &lt;a href="https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/" rel="noopener noreferrer"&gt;&lt;code&gt;gp2&lt;/code&gt; class uses a burst credit model&lt;/a&gt; where cross-medium deployments can hit a cliff when credits drain, and the symptom looks exactly like a buffer cache regression (latency climbs, hit ratio stays flat) even though the root cause is storage throttling.&lt;/p&gt;

&lt;p&gt;Check the storage class before you act on a cache hit ratio alert on a managed deployment, because the right remediation is sometimes a storage class upgrade rather than a memory one. Cloud-native monitoring tools like &lt;a href="https://www.site24x7.com/database-monitoring.html" rel="noopener noreferrer"&gt;Site24x7's database monitoring&lt;/a&gt; can surface this distinction automatically across RDS, Azure SQL, and Google Cloud SQL by correlating I/O metrics against buffer pool behavior in a single view.&lt;/p&gt;

&lt;h2&gt;
  
  
  Metrics and diagnostic follow-through
&lt;/h2&gt;

&lt;p&gt;The architecture sections above walked through what each component does and named the cheat-sheet row each one drives. This section pairs each of those rows with the exact diagnostic query you run when the alert fires, so measurement and investigation stop being two separate stages. Treat the subsections below as the operator's checklist; the architectural explanation lives upstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query execution time, with an EXPLAIN ANALYZE walkthrough
&lt;/h3&gt;

&lt;p&gt;Track query execution time per query pattern, not as a single dashboard aggregate. A p95 number that blends every query in the system hides the one that actually regressed.&lt;/p&gt;

&lt;p&gt;The sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL: &lt;a href="https://www.postgresql.org/docs/current/pgstatstatements.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_statements&lt;/code&gt;&lt;/a&gt;, enabled via &lt;code&gt;shared_preload_libraries = 'pg_stat_statements'&lt;/code&gt; in &lt;code&gt;postgresql.conf&lt;/code&gt; followed by a restart, and then &lt;code&gt;CREATE EXTENSION pg_stat_statements;&lt;/code&gt; in each database where you want visibility. The &lt;code&gt;shared_preload_libraries&lt;/code&gt; change loads the module; the view is not queryable until the extension is installed.&lt;/li&gt;
&lt;li&gt;MySQL: the slow query log, enabled via &lt;code&gt;slow_query_log = ON&lt;/code&gt; and &lt;code&gt;long_query_time = 1&lt;/code&gt; in &lt;code&gt;my.cnf&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;SQL Server: &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-stats-transact-sql?view=sql-server-ver16" rel="noopener noreferrer"&gt;&lt;code&gt;sys.dm_exec_query_stats&lt;/code&gt;&lt;/a&gt; joined to &lt;code&gt;sys.dm_exec_sql_text&lt;/code&gt;, available out of the box.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once a query pattern trips the alert, &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; is the next command you run. An 800 ms query against a large &lt;code&gt;members&lt;/code&gt; table looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;members&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;subscription_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active_paid'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;last_seen_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'90 days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A plan that opens with &lt;code&gt;Seq Scan on members (cost=0.00..45231.00 rows=2847182 width=...) (actual time=0.031..823.400 rows=2841000 loops=1)&lt;/code&gt; tells you the engine is reading every row. The fields that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cost=0.00..45231.00&lt;/code&gt; is the planner's estimated startup and total cost in arbitrary units, useful for comparing plans rather than reading as absolute time.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rows=2847182&lt;/code&gt; is the planner's row estimate; compare it against the &lt;code&gt;actual rows&lt;/code&gt; number in the parentheses to detect stale statistics.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;actual time=0.031..823.400&lt;/code&gt; is the real execution time in milliseconds, first row to last row.&lt;/li&gt;
&lt;li&gt;The node with the highest &lt;code&gt;actual time&lt;/code&gt; spread is where optimization effort should go, and in this plan it is the &lt;code&gt;Seq Scan&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A composite index changes the access pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_members_state_seen&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;members&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subscription_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_seen_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the index exists, &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; returns &lt;code&gt;Index Scan using idx_members_state_seen&lt;/code&gt; with actual execution time orders of magnitude lower than the sequential scan. Same schema, same query, different access pattern. MySQL 8.0.18+ supports &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/explain.html" rel="noopener noreferrer"&gt;&lt;code&gt;EXPLAIN ANALYZE FORMAT=TREE&lt;/code&gt;&lt;/a&gt; for equivalent runtime detail; &lt;code&gt;EXPLAIN FORMAT=JSON&lt;/code&gt; gives plan structure without runtime timing.&lt;/p&gt;

&lt;p&gt;One important caution the tooling does not remind you about: &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, or &lt;code&gt;DELETE&lt;/code&gt; statements actually executes the statement. Always wrap the call in a transaction and roll back, or you will quietly modify live data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Buffer cache hit ratio
&lt;/h3&gt;

&lt;p&gt;When the cheat sheet's hit ratio alert fires, anything below 90% on OLTP warrants immediate investigation: insufficient memory, a cold cache after restart, or working-set growth past buffer pool capacity. OLAP workloads may tolerate lower ratios, so calibrate against your baseline rather than a universal number.&lt;/p&gt;

&lt;p&gt;The diagnostic query for PostgreSQL pulls per-table hit ratios from &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_statio_user_tables&lt;/code&gt;&lt;/a&gt; so you can see exactly which tables are generating the disk reads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;heap_blks_hit&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heap_blks_hit&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;heap_blks_read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cache_hit_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;heap_blks_read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;heap_blks_hit&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_statio_user_tables&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cache_hit_pct&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ordering by &lt;code&gt;cache_hit_pct&lt;/code&gt; ascending surfaces the worst offenders first, which is the view you actually want during an incident. SQL Server's equivalent visibility comes through &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-buffer-descriptors-transact-sql?view=sql-server-ver17" rel="noopener noreferrer"&gt;&lt;code&gt;sys.dm_os_buffer_descriptors&lt;/code&gt;&lt;/a&gt;, aggregated by &lt;code&gt;database_id&lt;/code&gt; for a per-database view.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lock waits and deadlock counts
&lt;/h3&gt;

&lt;p&gt;The deadlock row on the cheat sheet fires on any non-zero count in a 5-minute window. The lock wait row is a rising-trend alert because absolute values vary too much by workload to set a universal threshold.&lt;/p&gt;

&lt;p&gt;For PostgreSQL, the live view of who is blocked on what comes from &lt;code&gt;pg_stat_activity&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;wait_event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;wait_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;usename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;application_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;query_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;wait_event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Lock'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'idle'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;query_start&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt; &lt;span class="n"&gt;NULLS&lt;/span&gt; &lt;span class="k"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;usename&lt;/code&gt; and &lt;code&gt;application_name&lt;/code&gt; columns give you attribution back to the source tier, which matters more than the PID during an incident. SQL Server's equivalent is &lt;code&gt;sys.dm_os_wait_stats&lt;/code&gt; filtered on &lt;code&gt;LCK_M_&lt;/code&gt; wait types for the class view, and &lt;code&gt;sys.dm_exec_requests&lt;/code&gt; filtered on &lt;code&gt;blocking_session_id IS NOT NULL&lt;/code&gt; for the live blocked sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection usage
&lt;/h3&gt;

&lt;p&gt;Sustained usage above 80% of the connection ceiling is the number to page on (some teams set 75-90% depending on risk tolerance). In PostgreSQL, &lt;code&gt;SELECT count(*) FROM pg_stat_activity&lt;/code&gt; gives the live number; in MySQL, &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/server-status-variables.html" rel="noopener noreferrer"&gt;&lt;code&gt;SHOW STATUS LIKE 'Threads_connected'&lt;/code&gt;&lt;/a&gt; returns the same value. On managed services, plug the current number into the &lt;code&gt;max_connections&lt;/code&gt; ceiling from the parameter group and check against the 80% line before the alert ever fires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wait statistics on SQL Server
&lt;/h3&gt;

&lt;p&gt;SQL Server's &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql?view=sql-server-ver17" rel="noopener noreferrer"&gt;&lt;code&gt;sys.dm_os_wait_stats&lt;/code&gt;&lt;/a&gt; classifies accumulated wait time by type and lets you answer "am I CPU-bound, I/O-bound, lock-bound, or memory-bound" as a first cut before committing to a deeper investigation. The wait classes that matter most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SOS_SCHEDULER_YIELD&lt;/code&gt; for CPU waits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PAGEIOLATCH_*&lt;/code&gt; for I/O waits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LCK_M_*&lt;/code&gt; for lock waits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RESOURCE_SEMAPHORE&lt;/code&gt; for memory grant waits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Raw DMV output is dominated by benign background waits, so a filtered query is the one worth keeping. &lt;a href="https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/" rel="noopener noreferrer"&gt;Paul Randal's widely-cited exclusion list&lt;/a&gt; filters the idle types and returns signal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TOP&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
       &lt;span class="n"&gt;wait_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;wait_time_ms&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;wait_s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_time_ms&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;signal_wait_time_ms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;resource_s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;signal_wait_time_ms&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;signal_s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;waiting_tasks_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_os_wait_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;wait_type&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s1"&gt;'SLEEP_TASK'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'BROKER_TO_FLUSH'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'SQLTRACE_BUFFER_FLUSH'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'CLR_AUTO_EVENT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'CLR_MANUAL_EVENT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'LAZYWRITER_SLEEP'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'SLEEP_SYSTEMTASK'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'WAITFOR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'BROKER_EVENTHANDLER'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'BROKER_RECEIVE_WAITFOR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'BROKER_TASK_STOP'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'DISPATCHER_QUEUE_SEMAPHORE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'FT_IFTS_SCHEDULER_IDLE_WAIT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'XE_DISPATCHER_WAIT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'XE_TIMER_EVENT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'REQUEST_FOR_DEADLOCK_SEARCH'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'CHECKPOINT_QUEUE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'TRACEWRITE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;wait_time_ms&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;wait_time_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The top row of this output is the bottleneck class, and it changes every subsequent diagnostic step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start collecting signal today
&lt;/h3&gt;

&lt;p&gt;If you have not instrumented any of the above, three commands per engine get you to "I can see the slowest queries" without any external tooling.&lt;/p&gt;

&lt;p&gt;PostgreSQL, after &lt;code&gt;pg_stat_statements&lt;/code&gt; is enabled and the extension is created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;mean_exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;stddev_exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;rows&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_statements&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;mean_exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Surfacing &lt;code&gt;mean_exec_time&lt;/code&gt; and &lt;code&gt;stddev_exec_time&lt;/code&gt; directly (instead of computing an average from &lt;code&gt;total_exec_time / calls&lt;/code&gt;) makes regressions jump out: a high standard deviation on a query that used to run flat is usually a parameter-sensitive plan or a missing index on a newly-common parameter value. &lt;br&gt;
MySQL, after enabling the slow query log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mysqldumpslow &lt;span class="nt"&gt;-s&lt;/span&gt; at /var/log/mysql/mysql-slow.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/mysqldumpslow.html" rel="noopener noreferrer"&gt;&lt;code&gt;-s at&lt;/code&gt; flag&lt;/a&gt; sorts by average time per query rather than total time. Average time is the right sort for spotting regressions (the query that used to be fast and now is not); total time is the right sort for spotting high-frequency cost hogs that were always a little slow. You can run both; this version picks average-time because it catches the kind of incident this article is about.&lt;/p&gt;

&lt;p&gt;SQL Server, directly against the DMVs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TOP&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_elapsed_time&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execution_count&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_elapsed_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execution_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_execution_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;qt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_exec_query_stats&lt;/span&gt; &lt;span class="n"&gt;qs&lt;/span&gt;
&lt;span class="k"&gt;CROSS&lt;/span&gt; &lt;span class="n"&gt;APPLY&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_exec_sql_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sql_handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;qt&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;avg_elapsed_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-stats-transact-sql?view=sql-server-ver16" rel="noopener noreferrer"&gt;Dividing by 1000.0&lt;/a&gt; gets you milliseconds rather than microseconds (easier to eyeball), and including &lt;code&gt;last_execution_time&lt;/code&gt; lets you spot recently-compiled plans that may still be in their post-deployment shakedown window.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL as its own observability instrument
&lt;/h2&gt;

&lt;p&gt;The five metric categories above cover operational threshold failures. They do not cover the class of failure where the data itself is silently wrong, and that class requires a different instrument.&lt;/p&gt;

&lt;p&gt;Consider a high-churn table where rows are supposed to receive a refresh event within a short time of creation. The alerting tooling can tell you the write rate, the read rate, the cache hit ratio, and the query latency. None of them can tell you that a subset of rows is being inserted and then never updated, which is the failure mode of a downstream worker that quietly stopped processing a specific partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stale_since&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;stale_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;stale_since&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'14 days'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;last_touched_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'24 hours'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stale_since&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A row whose &lt;code&gt;stale_since&lt;/code&gt; is inside the expected window but whose &lt;code&gt;last_touched_at&lt;/code&gt; has not advanced in 24 hours is a missed refresh event, not a latency spike. The query is cheap when &lt;code&gt;stale_since&lt;/code&gt; and &lt;code&gt;last_touched_at&lt;/code&gt; are indexed and expensive when they are not, and running it on a schedule catches the kind of incident the dashboard is structurally unable to see.&lt;/p&gt;

&lt;p&gt;The second pattern worth running is the orphan-row check. &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/tables/primary-and-foreign-key-constraints?view=sql-server-ver16" rel="noopener noreferrer"&gt;Foreign key constraints catch orphans at write time&lt;/a&gt; when they exist, but schemas that grew under application-layer integrity enforcement often lack constraints in some places. The anti-join surfaces the rows that should not exist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;invoices&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'7 days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A non-empty result set here is almost always a bug in the delete path of the parent table: someone deleted customers without cleaning up their invoices, and the integrity of every downstream report that joins on &lt;code&gt;customer_id&lt;/code&gt; is quietly wrong. Alerting tools do not generate this query. You have to write it.&lt;/p&gt;

&lt;p&gt;These patterns are not examples of what SQL can do. They are signal you cannot get any other way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three incidents, easiest to hardest
&lt;/h2&gt;

&lt;p&gt;Three worked examples, ordered by diagnostic difficulty rather than frequency. The easiest is the most mechanical, and the hardest involves the most engine-specific knowledge. Each scenario closes with the cheat-sheet row that would have caught it earlier, which is the point of having a cheat sheet in the first place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario A: buffer cache hit ratio drops from 98% to 87% overnight
&lt;/h3&gt;

&lt;p&gt;Run the per-table hit-ratio query from the buffer cache section above. Tables with low hit ratios are generating disk reads; cross-reference them against &lt;code&gt;pg_stat_statements&lt;/code&gt; for queries whose &lt;code&gt;blks_read&lt;/code&gt; climbed after the last deployment. The usual culprits are a new query doing a full sequential scan on a large table, data growth that pushed the working set past &lt;code&gt;shared_buffers&lt;/code&gt;, or a missing index introduced by a schema migration.&lt;/p&gt;

&lt;p&gt;If a newly-deployed query is the cause, add a covering index. If data growth is the cause, raise &lt;code&gt;shared_buffers&lt;/code&gt; to 25% of available system RAM (&lt;a href="https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server" rel="noopener noreferrer"&gt;PostgreSQL's dedicated-host guideline&lt;/a&gt;), keeping in mind that the change requires a restart and that &lt;code&gt;effective_cache_size&lt;/code&gt; needs to move with it.&lt;/p&gt;

&lt;p&gt;The cheat-sheet row that would have caught it earlier: the cache hit ratio alert, set to page on "below 95% sustained." If the ratio had been paging the team at 94% instead of being noticed at 87%, the investigation would have started half a day earlier with a much smaller blast radius.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario B: application returns lock wait timeout errors during peak traffic
&lt;/h3&gt;

&lt;p&gt;On PostgreSQL, run the &lt;code&gt;pg_stat_activity&lt;/code&gt; query from the lock waits section. Identify the session holding the lock that the waiting sessions need, then look at its &lt;code&gt;query_start&lt;/code&gt;. A transaction open for 45 minutes during a window where the scheduled batch job runs for 5 minutes tells you the batch job never committed, and the batch job's held row-level locks are what the OLTP traffic is queueing behind.&lt;/p&gt;

&lt;p&gt;On SQL Server, the equivalent path is &lt;code&gt;sys.dm_exec_requests&lt;/code&gt; filtered on &lt;code&gt;blocking_session_id IS NOT NULL&lt;/code&gt; for the blocked-sessions view and &lt;code&gt;sys.dm_os_wait_stats&lt;/code&gt; filtered on &lt;code&gt;LCK_M_&lt;/code&gt; for the wait-type distribution.&lt;/p&gt;

&lt;p&gt;The remediation is one of three, in increasing order of intrusiveness: isolate batch processing to a maintenance window; enable RCSI on SQL Server so readers proceed against the version store while writers continue updating live rows; or split the batch transaction into smaller units so no single commit window is wide enough to queue the OLTP traffic behind it.&lt;/p&gt;

&lt;p&gt;Earlier detection: the lock wait count rising-trend alert. Lock waits do not go from zero to crisis in a single minute; they climb for the length of the batch job, and the rising trend is visible for twenty to thirty minutes before the first HTTP 503 shows up in the load balancer logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario C: query p95 latency doubles after a deployment
&lt;/h3&gt;

&lt;p&gt;On SQL Server, the prime suspect is &lt;a href="https://www.brentozar.com/archive/2013/06/the-elephant-and-the-mouse-or-parameter-sniffing-in-sql-server/" rel="noopener noreferrer"&gt;parameter sniffing&lt;/a&gt;. The optimizer caches an execution plan on first execution using the literal parameter values passed at that moment. If those values are skewed against the overall distribution, every subsequent call runs the suboptimal cached plan and latency climbs without a corresponding change in workload.&lt;/p&gt;

&lt;p&gt;Start with the &lt;code&gt;sys.dm_exec_query_stats&lt;/code&gt; query from the "Start collecting signal today" section. To isolate the exact statement rather than the full batch text, replace &lt;code&gt;qt.text&lt;/code&gt; with &lt;code&gt;SUBSTRING(qt.text, (qs.statement_start_offset/2)+1, ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.text) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1)&lt;/code&gt;. Look for queries whose &lt;code&gt;avg_elapsed_ms&lt;/code&gt; climbed while &lt;code&gt;execution_count&lt;/code&gt; stayed flat or grew.&lt;/p&gt;

&lt;p&gt;Then retrieve the cached plan via &lt;code&gt;sys.dm_exec_query_plan&lt;/code&gt; to compare against a recompile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;qp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query_plan&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_exec_query_stats&lt;/span&gt; &lt;span class="n"&gt;qs&lt;/span&gt;
&lt;span class="k"&gt;CROSS&lt;/span&gt; &lt;span class="n"&gt;APPLY&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dm_exec_query_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plan_handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;qp&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;qs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sql_handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned &lt;code&gt;query_plan&lt;/code&gt; is XML. SSMS renders it as a graphical plan, and you can compare it directly against the plan produced by re-running &lt;a href="https://www.brentozar.com/archive/2016/08/start-troubleshooting-parameter-sniffing-issues/" rel="noopener noreferrer"&gt;the same query with &lt;code&gt;OPTION (RECOMPILE)&lt;/code&gt;&lt;/a&gt;. If the recompiled plan is meaningfully different and meaningfully faster, sniffing is confirmed. Remediation options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Immediate incident mitigation: &lt;a href="https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-freeproccache-transact-sql?view=sql-server-ver17" rel="noopener noreferrer"&gt;&lt;code&gt;DBCC FREEPROCCACHE(plan_handle)&lt;/code&gt;&lt;/a&gt; evicts the bad plan so the next call recompiles.&lt;/li&gt;
&lt;li&gt;Permanent per-query fix: add &lt;code&gt;OPTION (RECOMPILE)&lt;/code&gt; as a query hint, accepting the compilation cost on every execution.&lt;/li&gt;
&lt;li&gt;Plan stability alternative: &lt;a href="https://learn.microsoft.com/en-us/archive/blogs/mssqlisv/optimize-for-unknown-a-little-known-sql-server-2008-feature" rel="noopener noreferrer"&gt;&lt;code&gt;OPTION (OPTIMIZE FOR UNKNOWN)&lt;/code&gt;&lt;/a&gt; tells the optimizer to use average distribution statistics rather than first-call parameter values, which avoids the worst-case skew without paying the per-execution recompile cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On PostgreSQL, the same symptom more often traces to the stale-statistics failure mode described in the query processor section. Run &lt;code&gt;ANALYZE tablename&lt;/code&gt; after any large data load so the planner picks a correct plan on the next execution.&lt;/p&gt;

&lt;p&gt;Prevention point: a per-pattern p95 alert set to 2× baseline would have flagged the regression on the first post-deployment execution, rather than at whatever arbitrary threshold the aggregate dashboard happened to cross.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational hazards and compatibility notes
&lt;/h2&gt;

&lt;p&gt;Small-print items that would have bloated earlier sections and are worth knowing. Several of these cause incidents rather than mere confusion, so read the section as risk, not trivia.&lt;/p&gt;

&lt;p&gt;Reading deadlock graphs gets harder with three or more transactions. The two-transaction case in the concurrency section is the textbook shape; real production deadlocks often involve a third transaction holding a shared lock that neither cycle participant can bypass, and the InnoDB deadlock monitor only reports the most recent cycle rather than the full waits-for graph. On SQL Server, &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-deadlocks-guide?view=sql-server-ver17" rel="noopener noreferrer"&gt;capture the full graph via Extended Events with the &lt;code&gt;xml_deadlock_report&lt;/code&gt; event&lt;/a&gt; rather than relying on the system health session alone. On PostgreSQL, each deadlock log entry stands alone per process, so capturing a cycle with three or more participants means joining the &lt;code&gt;pg_stat_activity&lt;/code&gt; history for the PIDs listed in each &lt;code&gt;DETAIL&lt;/code&gt; block.&lt;/p&gt;

&lt;p&gt;All &lt;code&gt;pg_stat_statements&lt;/code&gt; queries in this article use PG13+ column names (&lt;code&gt;total_exec_time&lt;/code&gt;, &lt;code&gt;mean_exec_time&lt;/code&gt;, &lt;code&gt;stddev_exec_time&lt;/code&gt;). If you are still on a pre-13 version, the older names are &lt;code&gt;total_time&lt;/code&gt;, &lt;code&gt;mean_time&lt;/code&gt;, and &lt;code&gt;stddev_time&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For teams that want execution plan context correlated against APM trace IDs without writing the correlation layer manually, commercial tooling exists that handles this as a built-in view. &lt;a href="https://www.manageengine.com/it-operations-management/database-monitoring.html" rel="noopener noreferrer"&gt;ManageEngine OpManager Nexus&lt;/a&gt; covers the on-premise side, while &lt;a href="https://www.site24x7.com/help/database-monitoring/" rel="noopener noreferrer"&gt;Site24x7's database monitoring&lt;/a&gt; provides the cloud/SaaS counterpart for RDS, Aurora, Azure SQL, and self-managed instances. Both surface the correlation next to the &lt;code&gt;sys.dm_exec_query_stats&lt;/code&gt; join from Scenario C, rather than replacing it.&lt;/p&gt;

&lt;p&gt;The cheat sheet rows are not independent alerts. They form a causal chain: stale statistics trigger sequential scans, which blow through the buffer pool, which contend with transaction log writes, which inflate commit latency. When one row fires, the diagnostic path starts by checking whether the upstream component caused it. Think in chains, not rows, and the right fix surfaces faster.&lt;/p&gt;

&lt;p&gt;Pick one row from the alerting cheat sheet and turn it into a live signal by Friday. If you have paging infrastructure, wire the threshold into your on-call rotation. If you do not, schedule the matching diagnostic query as a cron job that writes to a log file you check daily. One row, one threshold, one query.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>performance</category>
      <category>sql</category>
    </item>
    <item>
      <title>Database Observability: An Engineer's Guide to Full-Stack Monitoring Across SQL, NoSQL, and Cloud Databases</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Wed, 08 Apr 2026 18:08:17 +0000</pubDate>
      <link>https://dev.to/damasosanoja/database-observability-an-engineers-guide-to-full-stack-monitoring-across-sql-nosql-and-cloud-1b3o</link>
      <guid>https://dev.to/damasosanoja/database-observability-an-engineers-guide-to-full-stack-monitoring-across-sql-nosql-and-cloud-1b3o</guid>
      <description>&lt;p&gt;Nobody plans a three-dashboard monitoring setup. It grows on its own. You deploy &lt;a href="https://dev.mysql.com/doc/" rel="noopener noreferrer"&gt;MySQL&lt;/a&gt;, so you add &lt;code&gt;mysqld_exporter&lt;/code&gt;. The team moves a workload to RDS, so you wire up a CloudWatch integration. Then &lt;a href="https://www.mongodb.com/docs/atlas/" rel="noopener noreferrer"&gt;MongoDB Atlas&lt;/a&gt; enters the stack, and Atlas ships its own metrics view. Three databases, three dashboards, three alert pipelines, zero correlation between them.&lt;/p&gt;

&lt;p&gt;At 2:47am, that fragmentation has a price. A &lt;a href="https://one2n.io/blog/sre-math-percentiles-in-sre-why-averages-lie-about-latency" rel="noopener noreferrer"&gt;p99&lt;/a&gt; latency spike fires an alert, and you spend fifteen minutes switching between tools before tracing it to a missing index. The data existed in three places. The relationship between those data points existed in none.&lt;/p&gt;

&lt;p&gt;That gap is the difference between &lt;a href="https://www.site24x7.com/what-is-database-monitoring.html" rel="noopener noreferrer"&gt;metric collection&lt;/a&gt; and observability. Metric collection tells you something crossed a threshold. Observability gives you the distributed trace connecting an application service, a SQL statement, host disk I/O, and a slow query log entry into one causal chain, so you can answer &lt;em&gt;why&lt;/em&gt; without adding new instrumentation after the incident starts.&lt;/p&gt;

&lt;p&gt;Most production environments already run this kind of mixed stack. &lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; handles transactional writes, MongoDB stores document data, Aurora or RDS manages read-heavy workloads, and a Redis or Memcached caching layer sits adjacent to all of it. This guide focuses on primary data stores: SQL, NoSQL, and cloud-managed databases. Caching layers have a different telemetry profile and are outside scope here. Each engine has a different telemetry model, a different collection method, and a different set of signals that actually predict trouble. Stitching observability across the full mix is the hard part, and it starts with knowing which signals to watch per engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually monitor, by database type
&lt;/h2&gt;

&lt;p&gt;A single &lt;a href="https://github.com/prometheus/mysqld_exporter" rel="noopener noreferrer"&gt;&lt;code&gt;mysqld_exporter&lt;/code&gt;&lt;/a&gt; instance can publish hundreds of Prometheus series. &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html" rel="noopener noreferrer"&gt;PostgreSQL's statistics collector&lt;/a&gt; exposes a comparable volume. During an incident, almost none of that matters. What matters is the handful of signals that predict user-facing degradation before it becomes a page.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL databases: PostgreSQL and MySQL
&lt;/h3&gt;

&lt;p&gt;The signals worth watching for PostgreSQL and MySQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query latency at p50, p95, and p99.&lt;/strong&gt; Average latency hides the outliers your users actually feel. A mean of 12ms tells you nothing if the p99 is 800ms, because that 1% of slow requests lands on real user sessions and drives timeout errors, retry storms, and SLA breaches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active connections versus connection limit.&lt;/strong&gt; On PostgreSQL, compare &lt;code&gt;numbackends&lt;/code&gt; in &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_database&lt;/code&gt;&lt;/a&gt; against &lt;a href="https://www.postgresql.org/docs/current/runtime-config-connection.html" rel="noopener noreferrer"&gt;&lt;code&gt;max_connections&lt;/code&gt;&lt;/a&gt;. On MySQL, compare &lt;code&gt;Threads_connected&lt;/code&gt; from &lt;code&gt;SHOW GLOBAL STATUS&lt;/code&gt; against the &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_connections" rel="noopener noreferrer"&gt;&lt;code&gt;max_connections&lt;/code&gt;&lt;/a&gt; system variable. Connection saturation causes query queuing before it causes timeouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache hit ratio.&lt;/strong&gt; On PostgreSQL, that's &lt;code&gt;heap_blks_hit / (heap_blks_hit + heap_blks_read)&lt;/code&gt; from &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STATIO-ALL-TABLES-VIEW" rel="noopener noreferrer"&gt;&lt;code&gt;pg_statio_user_tables&lt;/code&gt;&lt;/a&gt;. A ratio &lt;a href="https://www.red-gate.com/hub/product-learning/redgate-monitor/understanding-postgresqls-cache-hit-ratio" rel="noopener noreferrer"&gt;below 95% signals trouble; aim for 99%&lt;/a&gt;. On MySQL, the equivalent is the &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool.html" rel="noopener noreferrer"&gt;InnoDB buffer pool hit ratio&lt;/a&gt;: &lt;code&gt;1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)&lt;/code&gt; from &lt;code&gt;SHOW GLOBAL STATUS&lt;/code&gt;, where the same 99%+ target applies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replication lag&lt;/strong&gt; in seconds. On PostgreSQL, query &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-VIEW" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_replication&lt;/code&gt;&lt;/a&gt; for &lt;code&gt;replay_lag&lt;/code&gt;. Lag that climbs steadily means replicas are falling behind on writes, and read queries hitting those replicas will return stale data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.postgresql.org/docs/current/explicit-locking.html" rel="noopener noreferrer"&gt;Lock wait count&lt;/a&gt;.&lt;/strong&gt; Rising lock contention is the precursor to deadlocks. A sustained increase in waiting locks means transactions are blocking each other, and throughput will degrade before any single query times out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow query rate&lt;/strong&gt; over a rolling window. A sudden increase in the proportion of queries exceeding your slow-query threshold (typically 100ms-1s depending on workload) signals a regression, whether from a bad deployment, plan change, or resource contention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of these signals aren't surfaced in default dashboards. You need to query them directly to establish a baseline before automating collection.&lt;/p&gt;

&lt;p&gt;The PostgreSQL cache hit ratio from &lt;code&gt;pg_statio_user_tables&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heap_blks_hit&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;nullif&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heap_blks_hit&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;heap_blks_read&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="mi"&gt;4&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;hit_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_statio_user_tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;nullif&lt;/code&gt; call guards against division-by-zero on a cold instance where no blocks have been read yet. The &lt;code&gt;round&lt;/code&gt; wrapper gives you a clean four-decimal ratio instead of a long float.&lt;/p&gt;

&lt;p&gt;For query-level performance, &lt;a href="https://www.postgresql.org/docs/current/pgstatstatements.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_statements&lt;/code&gt;&lt;/a&gt; is where the data lives on PostgreSQL. Once the extension is enabled (see the implementation section), this query pulls the top 15 queries by total execution time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;left&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;query_preview&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;total_exec_time&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_time_sec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;mean_exec_time&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;rows&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_statements&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ordering matters. A query called 50,000 times at 2ms each burns far more total database time than one called 10 times at 500ms, yet only the latter trips a slow-query alert. Ranking by cumulative time surfaces both patterns.&lt;/p&gt;

&lt;p&gt;On MySQL, the equivalent lives in the &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/performance-schema.html" rel="noopener noreferrer"&gt;Performance Schema&lt;/a&gt;. The &lt;code&gt;events_statements_summary_by_digest&lt;/code&gt; table provides normalized query fingerprints with execution counts, total latency, and lock time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DIGEST_TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;query_digest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;COUNT_STAR&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;exec_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SUM_TIMER_WAIT&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_sec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AVG_TIMER_WAIT&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_sec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;SUM_ROWS_EXAMINED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;SUM_ROWS_SENT&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;performance_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events_statements_summary_by_digest&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;SUM_TIMER_WAIT&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MySQL's Performance Schema stores timer values in picoseconds, so the &lt;code&gt;/ 1e12&lt;/code&gt; conversion gives you seconds. The &lt;code&gt;SUM_ROWS_EXAMINED&lt;/code&gt; versus &lt;code&gt;SUM_ROWS_SENT&lt;/code&gt; comparison is useful too: a large gap between examined and sent rows often points to missing indexes.&lt;/p&gt;

&lt;p&gt;MySQL replication lag is available via &lt;code&gt;SHOW REPLICA STATUS\G&lt;/code&gt; under the &lt;code&gt;Seconds_Behind_Source&lt;/code&gt; field. If you're still on a version before 8.0.22, the command is &lt;code&gt;SHOW SLAVE STATUS&lt;/code&gt; and the field is &lt;code&gt;Seconds_Behind_Master&lt;/code&gt;; both old names were &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/mysql-nutshell.html" rel="noopener noreferrer"&gt;dropped entirely in MySQL 8.4&lt;/a&gt;. One caveat: this metric measures delay at the SQL apply thread, not end-to-end data freshness. Under multi-source replication or GTID-based topologies, it can report zero while a channel is actually stalled. Percona's &lt;code&gt;pt-heartbeat&lt;/code&gt; (or a custom heartbeat table that your application writes to and replicas read from) gives you a ground-truth lag measurement independent of the replication thread's self-reporting.&lt;/p&gt;

&lt;h3&gt;
  
  
  NoSQL databases: MongoDB
&lt;/h3&gt;

&lt;p&gt;MongoDB's signals that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Operation latency&lt;/strong&gt; from &lt;a href="https://www.mongodb.com/docs/manual/reference/command/serverStatus/#mongodb-serverstatus-serverstatus.opLatencies" rel="noopener noreferrer"&gt;&lt;code&gt;serverStatus.opLatencies&lt;/code&gt;&lt;/a&gt;, broken down by reads, writes, and commands. Separating read and write latency is critical because MongoDB workloads are often asymmetric, and a write latency spike won't show up in a combined average if reads dominate throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queue depth&lt;/strong&gt; via &lt;a href="https://www.mongodb.com/docs/v7.0/reference/command/serverstatus/" rel="noopener noreferrer"&gt;&lt;code&gt;globalLock.currentQueue.total&lt;/code&gt;&lt;/a&gt;. A rising queue means operations are waiting for execution faster than the engine can process them. Sustained queue growth precedes the latency cliff where response times go nonlinear.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replication oplog window&lt;/strong&gt; in hours. This is your buffer before a lagging secondary falls off the oplog and needs a full resync. An oplog window under 4 hours on a write-heavy deployment leaves little recovery margin (&lt;a href="https://www.mongodb.com/community/forums/t/oplog-window-best-practice-value/215225" rel="noopener noreferrer"&gt;community discussion on oplog sizing&lt;/a&gt; shows operators typically target 24+ hours). Your safe minimum depends on how long a full resync takes in your environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.mongodb.com/docs/manual/core/wiredtiger/" rel="noopener noreferrer"&gt;WiredTiger&lt;/a&gt; cache utilization&lt;/strong&gt; as a ratio of bytes in cache to the configured maximum (&lt;a href="https://www.percona.com/blog/mongodb-101-how-to-tune-your-mongodb-configuration-after-upgrading-to-more-memory/" rel="noopener noreferrer"&gt;default: the larger of 50% of (RAM minus 1 GB) or 256 MB&lt;/a&gt;). When the internal cache fills, eviction pressure forces the engine to discard and re-read pages more frequently. The resulting latency pattern looks like disk-bound behavior but originates inside the storage engine's own memory management, not the OS page cache. You won't identify this &lt;a href="https://www.percona.com/blog/mongodb-101-how-to-tune-your-mongodb-configuration-after-upgrading-to-more-memory/" rel="noopener noreferrer"&gt;eviction-driven latency&lt;/a&gt; from host-level memory metrics alone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All four signals come from a single shell command. Run &lt;code&gt;db.runCommand({ serverStatus: 1 })&lt;/code&gt; and extract what you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;serverStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Operation latency (microseconds) — split by read/write/command&lt;/span&gt;
&lt;span class="nf"&gt;printjson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;opLatencies&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Queue depth — operations waiting for execution&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Queued ops:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;globalLock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currentQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// WiredTiger cache pressure — ratio approaching 1.0 means eviction trouble&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wiredTiger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bytes currently in the cache&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wiredTiger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;maximum bytes configured&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cache fill:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;used&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the oplog window, &lt;code&gt;db.getReplicationInfo().timeDiff / 3600&lt;/code&gt; gives you hours of runway before a lagging secondary needs a full resync.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Atlas users:&lt;/strong&gt; On &lt;a href="https://www.mongodb.com/docs/atlas/monitor-cluster-metrics/" rel="noopener noreferrer"&gt;MongoDB Atlas&lt;/a&gt;, &lt;code&gt;serverStatus&lt;/code&gt; access depends on your cluster tier (M10+ for full stats). Atlas exposes metrics through its own Monitoring UI and the &lt;a href="https://www.mongodb.com/docs/atlas/api/atlas-admin-api-ref/" rel="noopener noreferrer"&gt;Atlas Administration API&lt;/a&gt;. The OTel &lt;code&gt;mongodb&lt;/code&gt; receiver connects to Atlas clusters via SRV connection strings (&lt;code&gt;mongodb+srv://&lt;/code&gt;) with SCRAM authentication.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud-managed databases: RDS, Aurora, and Cloud SQL
&lt;/h3&gt;

&lt;p&gt;With managed databases, you don't have SSH access or direct access to system views. The signals that matter are the same (connections, IOPS, replication, storage), but collection runs through cloud provider APIs instead.&lt;/p&gt;

&lt;p&gt;The signals to watch (metric names below use AWS CloudWatch conventions; Azure Monitor and GCP Cloud Monitoring expose equivalents under different names, e.g., &lt;code&gt;connection_count&lt;/code&gt; on Cloud SQL, &lt;code&gt;connection_successful&lt;/code&gt; on Azure SQL):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;DatabaseConnections&lt;/code&gt; versus the engine's max connection limit.&lt;/strong&gt; Managed instances enforce the same connection ceiling as self-hosted engines, but you can't tune OS-level socket limits to buy time. When you hit the cap, new connections are refused outright.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ReadIOPS&lt;/code&gt; and &lt;code&gt;WriteIOPS&lt;/code&gt; versus provisioned IOPS limits.&lt;/strong&gt; Exceeding provisioned IOPS triggers throttling at the storage layer, adding latency that looks like slow queries but originates below the engine. The queries themselves haven't changed; the disk can't keep up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;FreeStorageSpace&lt;/code&gt;.&lt;/strong&gt; Alert before autoscaling triggers, not after. Autoscaling events cause a brief I/O pause on some instance types, and if autoscaling is disabled, a full volume means writes stop entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ReplicaLag&lt;/code&gt;.&lt;/strong&gt; Same concern as self-managed replication: read replicas serving stale data. The difference is that you can't inspect the replication thread directly, so this CloudWatch metric is your only visibility into how far behind a replica has fallen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CPUCreditBalance&lt;/code&gt; on burstable instance types (T3, T4g).&lt;/strong&gt; A depleted credit balance is a hidden latency trigger that looks like a CPU spike but is actually credit exhaustion. Once credits hit zero, the instance is capped at baseline CPU, and every query slows down uniformly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Collection runs through &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricData.html" rel="noopener noreferrer"&gt;CloudWatch &lt;code&gt;GetMetricData&lt;/code&gt;&lt;/a&gt; for RDS and Aurora, the &lt;a href="https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/rest-api-walkthrough" rel="noopener noreferrer"&gt;Azure Monitor REST API&lt;/a&gt; for Azure SQL, and the &lt;a href="https://cloud.google.com/monitoring/api/v3" rel="noopener noreferrer"&gt;Cloud Monitoring API&lt;/a&gt; for Cloud SQL.&lt;/p&gt;

&lt;p&gt;The resolution tradeoff with CloudWatch matters. Standard RDS metrics publish at 1-minute intervals. &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.overview.html" rel="noopener noreferrer"&gt;AWS Enhanced Monitoring&lt;/a&gt; drops that to 1-second granularity for OS-level metrics, and &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html" rel="noopener noreferrer"&gt;Performance Insights&lt;/a&gt; adds DB load sampling at 1-second resolution with query-level attribution (the per-second samples are aggregated to produce the Top SQL view; query statistics themselves come from engine-level stats). Note: AWS has announced the &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.html" rel="noopener noreferrer"&gt;Performance Insights console experience will reach end-of-life on June 30, 2026&lt;/a&gt;, with functionality migrating to CloudWatch Database Insights. Native engine-level metrics through CloudWatch stay at 1-minute resolution, so transient sub-minute anomalies at the engine level are invisible by default.&lt;/p&gt;

&lt;p&gt;Hosted platforms like &lt;a href="https://www.site24x7.com/help/database-monitoring/" rel="noopener noreferrer"&gt;ManageEngine's database monitoring&lt;/a&gt; consolidate these cross-provider APIs into a single query interface, which is useful when a single fleet spans RDS, Azure SQL, and Cloud SQL simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Universal signals across all database types
&lt;/h3&gt;

&lt;p&gt;Regardless of engine, four metrics travel across any database and make cross-database comparison possible: query error rate, connection pool saturation (used / max), query throughput (QPS or TPS), and disk I/O wait percentage.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;th&gt;MySQL&lt;/th&gt;
&lt;th&gt;MongoDB&lt;/th&gt;
&lt;th&gt;AWS RDS / Aurora&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query latency&lt;/td&gt;
&lt;td&gt;pg_stat_statements (total_exec_time)&lt;/td&gt;
&lt;td&gt;events_statements_summary_by_digest&lt;/td&gt;
&lt;td&gt;opLatencies (reads/writes/commands)&lt;/td&gt;
&lt;td&gt;ReadLatency, WriteLatency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection pressure&lt;/td&gt;
&lt;td&gt;numbackends vs max_connections&lt;/td&gt;
&lt;td&gt;Threads_connected vs max_connections&lt;/td&gt;
&lt;td&gt;currentQueue.total&lt;/td&gt;
&lt;td&gt;DatabaseConnections vs engine max&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache health&lt;/td&gt;
&lt;td&gt;heap_blks_hit ratio (target ≥99%)&lt;/td&gt;
&lt;td&gt;InnoDB buffer pool hit ratio&lt;/td&gt;
&lt;td&gt;WiredTiger cache fill ratio&lt;/td&gt;
&lt;td&gt;BufferCacheHitRatio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replication delay&lt;/td&gt;
&lt;td&gt;pg_stat_replication.replay_lag&lt;/td&gt;
&lt;td&gt;Seconds_Behind_Source (or pt-heartbeat)&lt;/td&gt;
&lt;td&gt;oplog window in hours&lt;/td&gt;
&lt;td&gt;ReplicaLag (seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slow query signal&lt;/td&gt;
&lt;td&gt;pg_stat_statements + slow log&lt;/td&gt;
&lt;td&gt;slow_query_log + Perf Schema&lt;/td&gt;
&lt;td&gt;currentOp + database profiler&lt;/td&gt;
&lt;td&gt;Performance Insights / Database Insights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage / I/O pressure&lt;/td&gt;
&lt;td&gt;blks_read, I/O wait %&lt;/td&gt;
&lt;td&gt;Innodb_data_reads, I/O wait %&lt;/td&gt;
&lt;td&gt;WiredTiger eviction rate&lt;/td&gt;
&lt;td&gt;WriteIOPS vs provisioned IOPS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Knowing which signals matter is the first step. Collecting them consistently across every engine in a single pipeline is the next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a unified telemetry pipeline
&lt;/h2&gt;

&lt;p&gt;Three approaches exist for collecting database telemetry in production, each with a different tradeoff between setup speed, vendor independence, and long-term maintenance cost:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Vendor agents with proprietary instrumentation.&lt;/strong&gt; Fastest to deploy and lowest initial maintenance since the vendor manages the agent lifecycle. The cost is vendor independence: switching backends means re-instrumenting everything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://prometheus.io/docs/instrumenting/exporters/" rel="noopener noreferrer"&gt;Prometheus exporters&lt;/a&gt;&lt;/strong&gt; (&lt;code&gt;postgres_exporter&lt;/code&gt;, &lt;code&gt;mysqld_exporter&lt;/code&gt;, &lt;code&gt;mongodb_exporter&lt;/code&gt;). Moderate setup, vendor-neutral, and battle-tested. Maintenance stays low once running, but they're metric-only. They don't share a data model with your application traces, so correlation requires stitching across separate pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://opentelemetry.io/docs/collector/" rel="noopener noreferrer"&gt;OpenTelemetry Collector&lt;/a&gt; with database-specific receivers.&lt;/strong&gt; Its &lt;code&gt;postgresql&lt;/code&gt;, &lt;code&gt;mysql&lt;/code&gt;, and &lt;code&gt;mongodb&lt;/code&gt; receivers normalize metrics into shared &lt;a href="https://opentelemetry.io/docs/specs/semconv/database/" rel="noopener noreferrer"&gt;semantic conventions&lt;/a&gt;, so telemetry from different engines lands in a comparable format. Fully vendor-portable and trace-aware, but the most setup effort upfront and the highest ongoing maintenance (config drift, biweekly releases, semantic convention changes).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This guide uses the OTel Collector path. As of 2026, &lt;a href="https://dev.to/kubefeeds/observability-in-2025-opentelemetry-and-ai-to-fill-in-gaps-4bpm"&gt;OpenTelemetry is the de facto standard for new observability instrumentation&lt;/a&gt;, and it's the only option above that unifies database metrics and application traces under the same data model. Building on proprietary agents now means repeating this work at the next platform migration.&lt;/p&gt;

&lt;p&gt;Two common &lt;a href="https://opentelemetry.io/docs/collector/deployment/" rel="noopener noreferrer"&gt;deployment patterns&lt;/a&gt; exist. In &lt;strong&gt;agent mode&lt;/strong&gt;, a Collector runs on each database host, collects local metrics, and forwards them to a central gateway or directly to the backend. In &lt;strong&gt;gateway mode&lt;/strong&gt;, a centralized Collector reaches out to remote database endpoints. Agent mode gives you host-level correlation for free (the Collector inherits &lt;code&gt;host.id&lt;/code&gt;). Gateway mode reduces the number of Collector instances to manage. Most production setups use agent mode for self-managed databases and gateway mode for cloud-managed instances where you can't deploy locally.&lt;/p&gt;

&lt;p&gt;The following sections walk through receiver configuration for each database type, starting with PostgreSQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the PostgreSQL receiver
&lt;/h3&gt;

&lt;p&gt;One gotcha before the first receiver config: the &lt;code&gt;postgresql&lt;/code&gt;, &lt;code&gt;mysql&lt;/code&gt;, and &lt;code&gt;mongodb&lt;/code&gt; receivers ship in the &lt;a href="https://opentelemetry.io/docs/collector/distributions/" rel="noopener noreferrer"&gt;contrib distribution&lt;/a&gt;, not the core binary. Download &lt;a href="https://github.com/open-telemetry/opentelemetry-collector-releases/releases" rel="noopener noreferrer"&gt;&lt;code&gt;otelcol-contrib&lt;/code&gt;&lt;/a&gt; (also available as Docker image &lt;code&gt;otel/opentelemetry-collector-contrib&lt;/code&gt;) or the receivers won't be available. The configs below were validated against &lt;code&gt;otelcol-contrib&lt;/code&gt; v0.115.0. Receiver config schemas can change between releases; check the &lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver" rel="noopener noreferrer"&gt;receiver README&lt;/a&gt; for your installed version if you encounter validation errors.&lt;/p&gt;

&lt;p&gt;Create a dedicated monitoring user on your PostgreSQL instance (PostgreSQL 10+):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;otel_reader&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'change_me'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;pg_monitor&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;otel_reader&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/18/functions-admin.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_monitor&lt;/code&gt;&lt;/a&gt; is a built-in role (introduced in PostgreSQL 10) that bundles read access to every statistics view the receiver needs: activity stats, background writer stats, database-level stats, and &lt;code&gt;pg_stat_statements&lt;/code&gt; if the extension is loaded. On PostgreSQL 9.x, you'll need to grant &lt;code&gt;SELECT&lt;/code&gt; on each view individually since the bundled role doesn't exist.&lt;/p&gt;

&lt;p&gt;A minimal OTel Collector configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgresql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost:5432&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel_reader&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${env:PGMON_PASS}"&lt;/span&gt;
    &lt;span class="na"&gt;databases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app_prod&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app_analytics&lt;/span&gt;
    &lt;span class="na"&gt;collection_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20s&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;insecure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# disable for production; configure certs instead&lt;/span&gt;

&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp/primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;otel-gateway.internal:4317"&lt;/span&gt;

&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;postgresql&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp/primary&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details worth noting. The &lt;code&gt;tls: insecure: true&lt;/code&gt; flag disables TLS verification, acceptable for local development but not production. The &lt;code&gt;${env:VAR_NAME}&lt;/code&gt; syntax is the Collector's built-in expansion for OS environment variables. The Collector doesn't read &lt;code&gt;.env&lt;/code&gt; files, so set them before starting the process (e.g., &lt;code&gt;export PGMON_PASS=secret &amp;amp;&amp;amp; ./otelcol-contrib --config config.yaml&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/postgresqlreceiver/README.md" rel="noopener noreferrer"&gt;&lt;code&gt;postgresql&lt;/code&gt; receiver&lt;/a&gt; pulls metrics from &lt;code&gt;pg_stat_bgwriter&lt;/code&gt;, &lt;code&gt;pg_stat_database&lt;/code&gt;, and related system views. At the span level, verify that &lt;code&gt;db.system.name&lt;/code&gt;, &lt;code&gt;db.operation.name&lt;/code&gt;, and &lt;code&gt;db.query.text&lt;/code&gt; attributes are populating (these are the current names per &lt;a href="https://opentelemetry.io/docs/specs/semconv/database/" rel="noopener noreferrer"&gt;OTel Semantic Conventions v1.33.0&lt;/a&gt;). Older documentation may reference the deprecated &lt;code&gt;db.system&lt;/code&gt;, &lt;code&gt;db.operation&lt;/code&gt;, and &lt;code&gt;db.statement&lt;/code&gt; attributes, so check which version your instrumentation library implements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the MySQL receiver
&lt;/h3&gt;

&lt;p&gt;The same pattern applies: create a monitoring user, then point the receiver at it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL 8.0+ monitoring role&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="s1"&gt;'otel_reader'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'localhost'&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'change_me'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;PROCESS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REPLICATION&lt;/span&gt; &lt;span class="n"&gt;CLIENT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="s1"&gt;'otel_reader'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'localhost'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;performance_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="s1"&gt;'otel_reader'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'localhost'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mysql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost:3306&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel_reader&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${env:MYMON_PASS}"&lt;/span&gt;
    &lt;span class="na"&gt;collection_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20s&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;insecure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;mysql&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp/primary&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/mysqlreceiver/README.md" rel="noopener noreferrer"&gt;&lt;code&gt;mysql&lt;/code&gt; receiver&lt;/a&gt; collects from &lt;code&gt;SHOW GLOBAL STATUS&lt;/code&gt;, &lt;code&gt;SHOW REPLICA STATUS&lt;/code&gt;, and &lt;code&gt;performance_schema&lt;/code&gt; tables. Enable Performance Schema (&lt;code&gt;performance_schema=ON&lt;/code&gt; in &lt;code&gt;my.cnf&lt;/code&gt;) for query-level metrics. It has been on by default since MySQL 5.6.6, so most installations already have it active.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collecting CloudWatch metrics for RDS
&lt;/h3&gt;

&lt;p&gt;Cloud-managed databases don't allow local agent deployment, so the collection path differs. The OTel Collector's &lt;code&gt;awscloudwatchreceiver&lt;/code&gt; only supports logs, not metrics. For RDS metric collection through the OTel pipeline, the proven approach is &lt;a href="https://github.com/prometheus-community/yet-another-cloudwatch-exporter" rel="noopener noreferrer"&gt;YACE (Yet Another CloudWatch Exporter)&lt;/a&gt;, a Prometheus exporter maintained under the &lt;code&gt;prometheus-community&lt;/code&gt; org. YACE polls CloudWatch's &lt;code&gt;GetMetricData&lt;/code&gt; API and exposes the results as Prometheus metrics, which the Collector scrapes via its &lt;code&gt;prometheus&lt;/code&gt; receiver.&lt;/p&gt;

&lt;p&gt;YACE uses the standard AWS credential chain (instance profile, &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;/&lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;, or &lt;code&gt;~/.aws/credentials&lt;/code&gt;). The IAM principal requires &lt;code&gt;cloudwatch:GetMetricData&lt;/code&gt;, &lt;code&gt;cloudwatch:ListMetrics&lt;/code&gt;, and &lt;code&gt;tag:GetResources&lt;/code&gt; permissions.&lt;/p&gt;

&lt;p&gt;YACE configuration (&lt;code&gt;yace-config.yml&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;discovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS/RDS&lt;/span&gt;
      &lt;span class="na"&gt;regions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;eu-west-1&lt;/span&gt;
      &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DatabaseConnections&lt;/span&gt;
          &lt;span class="na"&gt;statistics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Average&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
          &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
          &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReadIOPS&lt;/span&gt;
          &lt;span class="na"&gt;statistics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Average&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
          &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
          &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReplicaLag&lt;/span&gt;
          &lt;span class="na"&gt;statistics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Maximum&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
          &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
          &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;YACE auto-discovers all RDS instances in the specified region. To limit to specific instances, add a &lt;code&gt;searchTags&lt;/code&gt; filter with a tag key/value pair you've applied to your RDS instances.&lt;/p&gt;

&lt;p&gt;YACE exposes metrics on port 5000 by default. Point the OTel Collector's &lt;code&gt;prometheus&lt;/code&gt; receiver at it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;prometheus/cloudwatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;scrape_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yace-rds&lt;/span&gt;
          &lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;
          &lt;span class="na"&gt;static_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:5000"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;prometheus/cloudwatch&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp/primary&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;scrape_interval&lt;/code&gt; should match YACE's &lt;code&gt;period&lt;/code&gt; to avoid gaps or duplicate data points.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the MongoDB receiver
&lt;/h3&gt;

&lt;p&gt;Back to the standard pattern for self-managed instances. Create a monitoring user with the &lt;code&gt;clusterMonitor&lt;/code&gt; role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Run in mongosh connected to the admin database&lt;/span&gt;
&lt;span class="nx"&gt;use&lt;/span&gt; &lt;span class="nx"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;otel_reader&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;pwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;change_me&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;clusterMonitor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;admin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;read&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;local&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   &lt;span class="c1"&gt;// needed for oplog access&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mongodb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mongo-primary.internal:27017&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel_reader&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${env:MONGOMON_PASS}"&lt;/span&gt;
    &lt;span class="na"&gt;collection_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20s&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;insecure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This receiver collects the &lt;code&gt;serverStatus&lt;/code&gt; metrics covered earlier (operation latency, queue depth, WiredTiger cache utilization, and replication oplog data) without requiring manual shell queries. For Atlas clusters, the same receiver connects via SRV connection strings (&lt;code&gt;mongodb+srv://&lt;/code&gt;) with SCRAM authentication; replace the &lt;code&gt;endpoint&lt;/code&gt; with your Atlas SRV URI.&lt;/p&gt;

&lt;h3&gt;
  
  
  The complete pipeline
&lt;/h3&gt;

&lt;p&gt;With all four receivers configured, the pipeline routes through a single Collector:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fny2j9tfp962ksb8y5b4q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fny2j9tfp962ksb8y5b4q.png" alt=" " width="800" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All telemetry, from a PostgreSQL instance on-prem, a MongoDB Atlas cluster, or an RDS replica in &lt;code&gt;us-east-1&lt;/code&gt;, routes through the same collector, lands in the same backend, and shares the same resource attributes (&lt;code&gt;host.id&lt;/code&gt;, &lt;code&gt;service.name&lt;/code&gt;, &lt;code&gt;db.name&lt;/code&gt;). Those shared attributes are what make cross-signal correlation possible, which is where the real incident-resolution speed comes from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-signal correlation: three axes that close incidents
&lt;/h2&gt;

&lt;p&gt;A unified pipeline gives you the raw material. But collection alone doesn't explain &lt;em&gt;why&lt;/em&gt; a latency spike happened. A PostgreSQL dashboard showing elevated p95 tells you something is wrong. It doesn't tell you whether the cause is a bad query, a contended host, or a deployment that changed application behavior. Answering that requires correlating database metrics with signals from outside the database.&lt;/p&gt;

&lt;p&gt;Three correlation axes progressively narrow the search space during an incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Axis 1: Database metrics + APM traces = which query caused it.&lt;/strong&gt; Slow database spans in distributed traces carry &lt;code&gt;db.query.text&lt;/code&gt; attributes that link directly to the responsible statement. When p95 spikes, the span shows the exact SQL. That span-to-query linkage automates what &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; does manually, across every query variant, on every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Axis 2: Database metrics + infrastructure metrics = what constrained it.&lt;/strong&gt; CPU steal, disk I/O wait, and network throughput on the database host reveal whether a slowdown is a resource contention issue. A report query that normally completes in 25ms but suddenly takes 1.2 seconds, with no deployment in between, is usually competing for disk or CPU on a shared host rather than running a degraded plan (though lock contention, stale statistics, or index bloat can look similar). Without the infrastructure layer, you'd waste time chasing query-level explanations for a host-level problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Axis 3: Database metrics + logs = what sequence of events led to it.&lt;/strong&gt; Slow query logs, error logs, and lock contention events provide the narrative that metric time series cannot. Metrics show what changed. Logs explain what happened. For example, lock contention is one of the most common incident triggers, and the metric alone (rising lock wait count) doesn't tell you &lt;em&gt;which&lt;/em&gt; session is blocking. Querying &lt;code&gt;pg_stat_activity&lt;/code&gt; with &lt;a href="https://www.postgresql.org/docs/current/functions-info.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_blocking_pids()&lt;/code&gt;&lt;/a&gt; (PostgreSQL 9.6+; for earlier versions, query &lt;code&gt;pg_locks&lt;/code&gt; directly) pinpoints the blocking session, its query, and how long it's been holding the lock:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;blocker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocker_pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;left&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocker_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;waiting_pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;left&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;waiting_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;blocker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state_change&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lock_held_for&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt; &lt;span class="n"&gt;waiting&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt; &lt;span class="n"&gt;blocker&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;blocker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_blocking_pids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait_event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Lock'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together, these three axes turn an alert into a causal chain: the trace identifies responsible queries, infrastructure metrics rule out host-level bottlenecks, and log correlation surfaces the trigger. Whether that chain resolves in one interface or across three separate tools depends on your platform and your alerting setup.&lt;/p&gt;

&lt;p&gt;Correlation closes the gap between alert and cause, but only if the alerts that wake you up are actually worth investigating.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alert fatigue is a design problem, platform choice is the fix
&lt;/h2&gt;

&lt;p&gt;Static thresholds on database metrics produce high false-positive rates. Query patterns vary by hour and day of week. A batch job that pushes p95 latency to 600ms every Tuesday at 3am is normal, not an incident. A static alert at 500ms pages you every Tuesday.&lt;/p&gt;

&lt;p&gt;Dynamic baselining eliminates this false-positive pattern. Instead of a hardcoded threshold, the alert fires when a metric deviates from its own rolling historical pattern for that time window. p95 at 600ms on Tuesday at 3am is expected. p95 at 600ms on Wednesday at 2pm is a deviation worth investigating.&lt;/p&gt;

&lt;p&gt;But dynamic baselining is only one piece. Whether you can actually implement it, and whether the alerts it produces are actionable, depends on what your observability platform supports. Alert quality is inseparable from platform choice. Six criteria separate a platform that sounds good in a demo from one that holds up at 3am:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coverage breadth.&lt;/strong&gt; Native support for your actual database mix (PostgreSQL, MySQL, MongoDB, RDS, Aurora, Azure SQL, and whatever else you run) is non-negotiable. Community plugins with no SLA add risk in production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Query-level visibility.&lt;/strong&gt; CPU and connection counts are necessary but insufficient. You need per-query latency distributions, execution counts, and normalized query fingerprinting that aggregates variants of the same logical query. Without fingerprinting, you're scrolling through raw query strings instead of seeing the handful of patterns that account for most of your total execution time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cross-signal correlation.&lt;/strong&gt; If database metrics, APM traces, and infrastructure metrics live in separate tools, you're doing the correlation manually. That context switch is where time evaporates during incidents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Alert quality.&lt;/strong&gt; Static thresholds versus dynamic baselining is the dividing line. Platforms that support rolling historical baselines eliminate most false positives from cyclical workload patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pricing model.&lt;/strong&gt; Per-host pricing behaves differently at 80 nodes than per-metric or per-GB pricing. Project the numbers against your current and expected fleet size before signing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operational overhead.&lt;/strong&gt; Agent deployment and upgrades across 80+ nodes compound over time. Centralized configuration, auto-upgrade, and agentless collection for cloud-managed databases (where agent deployment isn't an option) matter more than they appear in an initial evaluation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Criterion #4 (dynamic baselining) is where AI-driven features are pushing the boundary, moving beyond rolling averages into pattern detection that no human would configure manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI-assisted database monitoring: faster triage, not fewer engineers
&lt;/h2&gt;

&lt;p&gt;AI-driven features are gaining traction in observability platforms. The Grafana Observability Survey 2025 found that the two most sought-after AI capabilities were training-based alerts that fire on pattern deviations and faster root cause analysis through automated signal interpretation. These two ranked at the top across nearly every demographic surveyed. Autonomous remediation drew interest, but with significant practitioner skepticism. The pattern is clear: engineers want faster triage, not hands-off automation.&lt;/p&gt;

&lt;p&gt;Where AI adds the most value is in catching what no human would wire up manually: co-occurring metric changes across signals (a replication lag spike alongside a batch job CPU spike on the same host) that only correlate under specific conditions. Capacity forecasting is the other win, spotting growth trends that will cause pressure weeks before the pressure becomes a production incident.&lt;/p&gt;

&lt;p&gt;The judgment call that follows still requires a person. Deciding whether a flagged query needs a composite index, a denormalized read path, or a move to a different storage engine depends on access patterns, consistency requirements, and how the data model will evolve over the next two quarters. No anomaly detector has that context. AI narrows the search; an engineer who understands the domain decides what to do with what it finds.&lt;/p&gt;

&lt;p&gt;These capabilities come from the platform, not the pipeline. If you've built the OTel collection layer yourself, the question becomes what that self-assembled stack actually costs to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The operational cost of a self-assembled stack
&lt;/h2&gt;

&lt;p&gt;If you've followed along this far, you've assembled a capable observability pipeline: OTel Collector with four receivers, application SDK instrumentation, alerting rules, and cross-signal correlation. It works. But it's worth tallying what you're now maintaining.&lt;/p&gt;

&lt;p&gt;The Collector itself needs upgrades. Core and contrib &lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/1631" rel="noopener noreferrer"&gt;release together every two weeks&lt;/a&gt;, and each release can bring receiver config changes and semantic convention updates (the &lt;code&gt;db.statement&lt;/code&gt; to &lt;code&gt;db.query.text&lt;/code&gt; rename is a recent example). Across a fleet of 20+ database nodes, that's 20+ Collector configs to keep in sync. YAML drift is quiet until it causes a gap in your telemetry during an incident.&lt;/p&gt;

&lt;p&gt;Alert tuning is ongoing. Static thresholds need manual adjustment as workloads evolve. Dynamic baselines, if your backend supports them, need their own validation. Each new database instance means another set of receiver configs, user grants, and alert rules.&lt;/p&gt;

&lt;p&gt;Cloud-managed databases add a different kind of overhead. IAM policies, CloudWatch API rate limits, and the resolution gaps between standard and enhanced monitoring all require attention that scales with the number of instances.&lt;/p&gt;

&lt;p&gt;None of this is unreasonable for a team with dedicated platform engineering capacity. But for teams where observability is one responsibility among many, the assembly and maintenance cost is the real expense, not the software licenses. The next section walks through the implementation sequence; the managed alternative follows at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started: a concrete implementation sequence
&lt;/h2&gt;

&lt;p&gt;You can get the first piece of actionable data quickly. Run the &lt;code&gt;pg_stat_statements&lt;/code&gt; query from the PostgreSQL section above and see which queries dominate your database's total execution time. The full setup depends on your environment, but each step below is individually small.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Enable pg_stat_statements
&lt;/h3&gt;

&lt;p&gt;Check what's already loaded with &lt;code&gt;SHOW shared_preload_libraries;&lt;/code&gt;. If the result is empty, run &lt;code&gt;ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';&lt;/code&gt;. If other libraries are already loaded (e.g., &lt;code&gt;timescaledb&lt;/code&gt;), append rather than replace: &lt;code&gt;ALTER SYSTEM SET shared_preload_libraries = 'timescaledb, pg_stat_statements';&lt;/code&gt;. This requires a full PostgreSQL restart, which means a maintenance window in production. After the restart, run &lt;code&gt;CREATE EXTENSION pg_stat_statements;&lt;/code&gt; in your target database and query it immediately to get your baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Instrument your application with an OTel SDK
&lt;/h3&gt;

&lt;p&gt;The Collector pipeline in Step 3 collects infrastructure-level database metrics. Application-level database spans (the ones carrying &lt;code&gt;db.query.text&lt;/code&gt; that link to APM traces) require your application to emit them via an OTel SDK. Each language and database driver combination needs its own instrumentation library, SDK initialization, and exporter configuration. The &lt;a href="https://opentelemetry.io/ecosystem/registry/" rel="noopener noreferrer"&gt;OTel Instrumentation Registry&lt;/a&gt; covers the specific packages. For a team running multiple services across multiple languages, this step alone touches every application in the stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Deploy the OTel Collector
&lt;/h3&gt;

&lt;p&gt;Deploy the Collector with the &lt;code&gt;postgresql&lt;/code&gt; receiver on the same host, using the configuration from the pipeline section above. Point it at your backend via &lt;a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write" rel="noopener noreferrer"&gt;Prometheus remote write&lt;/a&gt; or an OTLP endpoint. Verify that &lt;code&gt;db.system.name&lt;/code&gt;, &lt;code&gt;db.name&lt;/code&gt;, and &lt;code&gt;db.query.text&lt;/code&gt; attributes are populating on spans from your application's database client library.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Set baseline alerts
&lt;/h3&gt;

&lt;p&gt;Three non-negotiable alerts to start with. If your platform supports dynamic baselining, use these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p95 SELECT latency more than 2x the 7-day rolling baseline for the same hour-of-week&lt;/li&gt;
&lt;li&gt;Connection utilization (active / max) above 80% sustained for 5 minutes&lt;/li&gt;
&lt;li&gt;Replication lag above 30 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 5: Verify cross-signal correlation
&lt;/h3&gt;

&lt;p&gt;Trigger a slow query manually with &lt;code&gt;SELECT pg_sleep(3);&lt;/code&gt; and confirm the resulting database span in your APM traces carries the &lt;code&gt;db.query.text&lt;/code&gt; attribute (or &lt;code&gt;db.statement&lt;/code&gt; if your library uses the older convention) and links back to the metric spike. If it doesn't, your pipeline has a tagging gap that will cost you during the next real incident. Fix it now while the system is quiet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Repeat for your next database
&lt;/h3&gt;

&lt;p&gt;Once PostgreSQL is fully instrumented and alerting is stable, repeat Steps 1 through 5 for your next database type. Each engine means a different receiver config, different monitoring user grants, different signal verification, and a different set of edge cases. A three-database stack means running this sequence three times, each with its own failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the DIY path delivers
&lt;/h2&gt;

&lt;p&gt;If you've followed the implementation sequence above, the 2:47am scenario from the introduction looks different now. Instead of fifteen minutes switching between dashboards, you have a single correlated timeline where the responsible query, the host contention, and the triggering event are already connected.&lt;/p&gt;

&lt;p&gt;That's the DIY path. It works, and it's entirely vendor-neutral. The tradeoff is the assembly and ongoing maintenance cost that scales with every database you add to the fleet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managed alternative: same criteria, less assembly
&lt;/h2&gt;

&lt;p&gt;For teams where that tradeoff doesn't pencil out, &lt;a href="https://www.manageengine.com/it-operations-management/database-monitoring.html" rel="noopener noreferrer"&gt;ManageEngine OpManager Nexus&lt;/a&gt; is one option worth evaluating. Here's how it maps against the six criteria from the alerting section:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coverage breadth:&lt;/strong&gt; Out-of-the-box monitoring for &lt;a href="https://www.manageengine.com/it-operations-management/database-monitoring.html" rel="noopener noreferrer"&gt;50+ database types&lt;/a&gt;, from PostgreSQL and MongoDB to managed offerings like Aurora and Azure SQL. No per-engine receiver assembly or contrib binary juggling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query-level visibility:&lt;/strong&gt; Latency distributions, execution frequency, and fingerprinted query grouping that rolls up thousands of raw statements into the patterns that actually drive load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-signal correlation:&lt;/strong&gt; Database, application, and host telemetry share a single interface. During an incident, you click from a slow query span to the host's CPU timeline without opening a second tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert quality:&lt;/strong&gt; ML-driven baselines that learn your workload's weekly rhythm, so the Tuesday 3am batch job doesn't page anyone but a Wednesday 2pm anomaly does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing model:&lt;/strong&gt; Priced per monitor rather than per GB of ingested telemetry. At 80+ database nodes, this distinction determines whether the bill scales linearly or exponentially.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational overhead:&lt;/strong&gt; Cloud-managed databases connect via JDBC and cloud APIs with no local agent. Self-managed instances use centralized config pushed from the server, so there's no per-node YAML to maintain or drift to chase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams whose telemetry lives primarily in AWS, Azure, or GCP, the cloud-delivered sibling is &lt;a href="https://www.site24x7.com/database-monitoring.html" rel="noopener noreferrer"&gt;Site24x7&lt;/a&gt;, ManageEngine's SaaS monitoring platform. The same six criteria apply: native coverage for PostgreSQL, MySQL, SQL Server, Oracle, MongoDB, and RDS/Aurora; query-level latency with fingerprinting; correlated application and infrastructure metrics in one console; AI-driven anomaly detection on per-query baselines. The tradeoff flips compared to a self-hosted deployment. No local infrastructure to run, but telemetry leaves your environment, and retention is tied to the subscription tier.&lt;/p&gt;

&lt;p&gt;Whether the managed path or the DIY pipeline is the better fit depends on your team's platform engineering capacity and how many database types you're running. The six criteria give you a framework to evaluate either approach, or any other platform, on equal footing.&lt;/p&gt;




&lt;p&gt;What does your current database monitoring setup look like? If you're running a mixed stack, I'd be curious to hear how you're handling cross-signal correlation today, and where it still breaks down.&lt;/p&gt;

</description>
      <category>database</category>
      <category>devops</category>
      <category>monitoring</category>
      <category>performance</category>
    </item>
    <item>
      <title>LLM Inference Optimization: Techniques That Actually Reduce Latency and Cost</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:50:09 +0000</pubDate>
      <link>https://dev.to/damasosanoja/llm-inference-optimization-techniques-that-actually-reduce-latency-and-cost-3fjg</link>
      <guid>https://dev.to/damasosanoja/llm-inference-optimization-techniques-that-actually-reduce-latency-and-cost-3fjg</guid>
      <description>&lt;p&gt;Your GPU bill is doubling every quarter, but your throughput metrics haven’t moved. A standard Hugging Face pipeline() call keeps your A100 significantly underutilized under real traffic patterns because it processes one request sequentially while everything else waits. You’re paying for idle silicon.&lt;/p&gt;

&lt;p&gt;The fix is switching from naive serving to optimized serving, which means deploying the same model differently. High-performance teams running Llama-3-70B in production have converged on a specific stack: &lt;a href="https://docs.vllm.ai/en/latest/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; or &lt;a href="https://github.com/sgl-project/sglang" rel="noopener noreferrer"&gt;SGLang&lt;/a&gt; as the inference engine, &lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt; for observability, and &lt;a href="https://www.runpod.io/" rel="noopener noreferrer"&gt;Runpod&lt;/a&gt; as the infrastructure layer that lets them deploy and iterate without managing a Kubernetes cluster. This guide works through that stack in ROI order: quantization (VRAM footprint), serving engine selection (throughput), speculative decoding (latency), and deployment mode (cost-scaling).&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The bottlenecks are compute and memory, not model size alone&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;LLM inference has two phases with different performance characteristics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/" rel="noopener noreferrer"&gt;Prefill is the compute-bound phase.&lt;/a&gt; The model processes your entire input prompt in a single forward pass, and &lt;a href="https://docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html" rel="noopener noreferrer"&gt;that determines your Time to First Token (TTFT)&lt;/a&gt;. On a dense 70B model, a 4,000-token prompt might take 400ms to prefill across a tensor-parallel A100 setup. You can’t parallelize this across requests in the same way, so the only real lever is raw compute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.vllm.ai/2025/09/05/anatomy-of-vllm.html" rel="noopener noreferrer"&gt;Decode is the memory-bound phase.&lt;/a&gt; The model generates one token at a time, and each step requires loading the entire model’s KV cache from GPU VRAM. &lt;a href="https://blog.vllm.ai/2025/09/05/anatomy-of-vllm.html" rel="noopener noreferrer"&gt;VRAM bandwidth almost entirely determines inter-token latency&lt;/a&gt;, with FLOPs playing a secondary role. An H100 SXM5 has &lt;a href="https://www.nvidia.com/en-us/data-center/h100/" rel="noopener noreferrer"&gt;3.35 TB/s of memory bandwidth&lt;/a&gt; versus an A6000’s 768 GB/s, which explains most of the latency delta between them on long-form generation.&lt;/p&gt;

&lt;p&gt;The KV cache is the core pressure point. For every token in a sequence, attention layers &lt;a href="https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/" rel="noopener noreferrer"&gt;store key and value tensors&lt;/a&gt;. The memory footprint follows this formula: num_layers × 2 × num_kv_heads × head_dim × seq_len × dtype_bytes. For Llama-3-70B (80 layers, GQA with 8 KV heads, head_dim=128) at BF16 (2 bytes): 80 × 2 × 8 × 128 × 4,096 × 2 ≈ 1.3 GB per request at a 4,096-token context. That number scales linearly with sequence length, which is why long-context workloads &lt;a href="https://www.bentoml.com/blog/what-is-gpu-memory-and-why-it-matters-for-llm-inference" rel="noopener noreferrer"&gt;saturate VRAM before FLOPs become the bottleneck&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt; lets you see this in real time. The &lt;a href="https://docs.vllm.ai/en/latest/serving/metrics.html" rel="noopener noreferrer"&gt;vLLM metrics endpoint&lt;/a&gt; exposes vllm:gpu_cache_usage_perc and vllm:num_requests_waiting via a /metrics endpoint. Wire those up to &lt;a href="https://grafana.com/" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt;, and you’ll immediately see when you’re cache-bound versus compute-bound, which tells you exactly which optimization to reach for first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnzy8nm533gbrqdsh175n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnzy8nm533gbrqdsh175n.png" alt="General Workflow" width="472" height="944"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For most teams serving 70B-class models under concurrent load, VRAM pressure arrives before compute does.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Quantization strategy: fit more models into less VRAM&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Quantization, specifically switching from BF16 to a 4-bit format, is the single biggest optimization available to most teams. At the unit economics level, a Llama-3-70B model in BF16 &lt;a href="https://community.ibm.com/community/user/cloud/blogs/arindam-dasgupta/2024/09/18/calculating-gpu-requirements-for-efficient-llama-3" rel="noopener noreferrer"&gt;occupies roughly 140GB of VRAM&lt;/a&gt;, which requires at a minimum two H100 80GB GPUs at roughly \$2.69/hr each on Runpod. The same model in 4-bit AWQ &lt;a href="https://www.theregister.com/2024/07/14/quantization_llm_feature/" rel="noopener noreferrer"&gt;fits comfortably on dual RTX A6000s (96GB total)&lt;/a&gt;, which run at approximately \$0.49/hr per GPU on Runpod. That’s over 80% cost reduction with minimal quality loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2306.00978" rel="noopener noreferrer"&gt;AWQ (Activation-Aware Weight Quantization)&lt;/a&gt; is the current standard for Llama-class models. AWQ preserves the 1% of weights that have the most impact on activation outputs, which is why the perplexity delta between a well-quantized AWQ model and its BF16 source is often below 0.5 points on standard benchmarks.&lt;/p&gt;

&lt;p&gt;You don’t need to quantize the model yourself. The TechxGenus collection on &lt;a href="https://huggingface.co/TechxGenus" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; includes production-ready AWQ versions of Llama-3-70B. Deploying it on a Runpod Pod requires pulling the vLLM Docker image and configuring your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_token &lt;span class="se"&gt;\&lt;/span&gt;
  vllm/vllm-openai:latest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; TechxGenus/Meta-Llama-3-70B-Instruct-AWQ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quantization&lt;/span&gt; awq &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 8192
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/" rel="noopener noreferrer"&gt;H100s support native FP8 tensor cores&lt;/a&gt;, so if you have access to them, FP8 quantization is worth evaluating. FP8 inference runs without emulation overhead, vLLM enables it with --quantization fp8, and &lt;a href="https://docs.vllm.ai/en/v0.5.4/quantization/fp8.html" rel="noopener noreferrer"&gt;VRAM usage drops by roughly 50% compared to BF16&lt;/a&gt;. The throughput improvement over BF16 reaches up to 1.6x on generation-heavy workloads, which means you can &lt;a href="https://lambda.ai/blog/nvidia-hopper-h100-and-fp8-support" rel="noopener noreferrer"&gt;serve a 70B model on a single H100 SXM&lt;/a&gt; with headroom for longer contexts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/casper-hansen/AutoAWQ" rel="noopener noreferrer"&gt;AutoAWQ&lt;/a&gt; quantizes a custom fine-tuned checkpoint in Python in under 30 minutes on an A10G:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoAWQForCausalLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;

&lt;span class="n"&gt;model_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-finetuned-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;quant_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-model-awq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;quant_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zero_point&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_group_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w_bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoAWQForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;quantize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quant_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;quant_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_quantized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quant_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With your model’s VRAM footprint reduced, the next constraint is how efficiently your serving engine keeps the GPU saturated under real traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Throughput and structured generation with vLLM and SGLang&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Continuous batching, introduced in &lt;a href="https://www.usenix.org/conference/osdi22/presentation/yu" rel="noopener noreferrer"&gt;Orca (2022)&lt;/a&gt; and &lt;a href="https://blog.vllm.ai/2023/06/20/vllm.html" rel="noopener noreferrer"&gt;implemented in vLLM&lt;/a&gt;, is what makes modern serving engines work. Traditional static batching &lt;a href="https://www.anyscale.com/blog/continuous-batching-llm-inference" rel="noopener noreferrer"&gt;waits for a full batch of requests to complete before starting new ones&lt;/a&gt;. Continuous batching inserts new requests into the decode loop as soon as a slot opens up, keeping GPU utilization well above what you see with sequential processing. &lt;a href="https://www.21medien.de/en/library/continuous-batching" rel="noopener noreferrer"&gt;Real-world figures run 60-85%&lt;/a&gt; under steady traffic compared to the low utilization of naive serving.&lt;/p&gt;

&lt;p&gt;vLLM also implements PagedAttention, which &lt;a href="https://arxiv.org/abs/2309.06180" rel="noopener noreferrer"&gt;treats VRAM like virtual memory for KV cache&lt;/a&gt;, eliminating the need to pre-allocate contiguous blocks. PagedAttention allows more sequences to coexist in memory simultaneously, directly improving throughput on concurrent workloads.&lt;/p&gt;

&lt;p&gt;For agentic workflows, multi-step chains, and structured JSON output, &lt;a href="https://github.com/sgl-project/sglang" rel="noopener noreferrer"&gt;SGLang&lt;/a&gt; frequently outperforms standard vLLM. SGLang’s RadixAttention mechanism automatically reuses the KV cache for shared prompt prefixes across requests. In an agentic workflow where every request starts with the same system prompt and tool definitions (often 1,000+ tokens), RadixAttention computes that prefix once and caches it rather than recomputing it per request. &lt;a href="https://lmsys.org/blog/2024-01-17-sglang/" rel="noopener noreferrer"&gt;LMSYS benchmark data shows SGLang consistently delivering higher throughput on structured generation tasks&lt;/a&gt; compared to equivalent vLLM configurations, specifically because of this shared prefix optimization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9zloywwafeo93aakpvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9zloywwafeo93aakpvu.png" alt="vLLM vs. SGLang decision matrix" width="800" height="574"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few flags have an outsized impact when you deploy via a Runpod Pod or template, regardless of which engine you’re running. For vLLM, --max-num-seqs controls the maximum number of sequences in the batch. Set it too high and you’ll OOM. Set it too low, and you leave throughput on the table. A reasonable starting point for dual A6000s with a quantized 70B is --max-num-seqs 64. Add --disable-log-stats in production to eliminate logging overhead that adds a few milliseconds per batch on high-QPS endpoints.&lt;/p&gt;

&lt;p&gt;For SGLang, --tp 2 sets tensor parallelism across two GPUs. --chunked-prefill-size 512 controls chunked prefill, which prevents long prompts from monopolizing the GPU and improves latency fairness across concurrent requests. Start with 512 for mixed-length workloads. Increase to 1024 if your traffic is predominantly short prompts, or drop to 256 if you’re seeing latency spikes from long system prompts under concurrent load.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Speculative decoding: cut latency without changing hardware&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If your workload skews toward long-form generation (coding assistants, document summarization, report generation), speculative decoding is one of the biggest latency reductions you can get without changing hardware.&lt;/p&gt;

&lt;p&gt;A small draft model (typically 1-7B parameters) generates 3-12 candidate tokens per step. The large target model &lt;a href="https://research.google/blog/looking-back-at-speculative-decoding/" rel="noopener noreferrer"&gt;verifies all candidates in a single parallel forward pass&lt;/a&gt;. When the draft model guesses correctly (at rates as high as 70-90% with a well-matched draft model on domain-specific tasks), you get multiple tokens for roughly the cost of one target model step. &lt;a href="https://arxiv.org/abs/2211.17192" rel="noopener noreferrer"&gt;Research on speculative decoding&lt;/a&gt; shows 2-3x speedups on generation-heavy tasks.&lt;/p&gt;

&lt;p&gt;The economic case is direct: if you’re paying \$3/hr for your inference endpoint and speculative decoding cuts latency by 2x, you either halve your cost per request at the same throughput or serve twice the requests at the same cost. Neither requires touching your hardware configuration.&lt;/p&gt;

&lt;p&gt;Deploying a speculative decoding setup with the &lt;a href="https://docs.runpod.io/sdks/python/overview" rel="noopener noreferrer"&gt;Runpod SDK&lt;/a&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;runpod&lt;/span&gt;

&lt;span class="n"&gt;runpod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;pod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runpod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_pod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3-70b-speculative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;image_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm/vllm-openai:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gpu_type_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVIDIA RTX A6000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gpu_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;container_disk_in_gb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_hf_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;docker_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--model TechxGenus/Meta-Llama-3-70B-Instruct-AWQ &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--quantization awq &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--tensor-parallel-size 2 &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--speculative-model TechxGenus/Meta-Llama-3-8B-Instruct-AWQ &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--num-speculative-tokens 5 &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--max-model-len 8192&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pod ID:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pod&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The draft model must come from the same model family as your target. Llama-3-8B-Instruct-AWQ as a draft model for Llama-3-70B-Instruct-AWQ is the canonical pairing. Mismatched architectures produce low acceptance rates that eliminate the speedup entirely. You can verify the draft model’s effectiveness via vLLM’s vllm:spec_decode_draft_acceptance_length metric in Prometheus. If the acceptance rate falls below roughly 0.5 tokens per step, the draft model is poorly matched, and speculative decoding is adding overhead rather than reducing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Serverless vs. pods: architecting for cost&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.runpod.io/serverless/overview" rel="noopener noreferrer"&gt;Runpod Serverless&lt;/a&gt; scales to zero between requests and spins up workers on demand. Billing is per-second of GPU time, so you pay only while a worker is active with no reserved-capacity cost during idle periods. This is the right choice for spiky, unpredictable traffic (a chatbot that sees 1,000 concurrent users at 9 am and 20 at 3 am, for example). The historical objection to serverless LLM hosting was cold start time: loading a large model from cold could take a minute or more, making the first request in any cold-start window intolerable. Runpod’s FlashBoot technology reduces this through container-level and image-level optimizations, making cold starts practical for production use.&lt;/p&gt;

&lt;p&gt;Runpod Pods are persistent GPU instances billed per-second. Use them when your traffic is sustained, when you’re running fine-tuning jobs with &lt;a href="https://docs.ray.io/en/latest/" rel="noopener noreferrer"&gt;Ray&lt;/a&gt;, or when you need consistent latency guarantees for SLA-bound endpoints. A Ray-based distributed fine-tuning job &lt;a href="https://docs.ray.io/en/latest/train/overview.html" rel="noopener noreferrer"&gt;requires consistent inter-node communication&lt;/a&gt; that serverless cold starts would interrupt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2xj3ghwyxni0f44gtbc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2xj3ghwyxni0f44gtbc.png" alt="Runpod serverless" width="800" height="901"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Setup time matters too. The delta between Runpod and bare-metal providers like &lt;a href="https://lambdalabs.com/" rel="noopener noreferrer"&gt;Lambda Labs&lt;/a&gt; is large. Reaching an equivalent setup on a bare VM requires provisioning the instance, configuring the OS and CUDA drivers, installing Docker, setting up your orchestration layer (Kubernetes or Slurm), deploying your inference container, configuring autoscaling rules, and wiring up your load balancer. That’s a realistic two-week sprint for an engineer who hasn’t done it before. On Runpod, you select a &lt;a href="https://www.runpod.io/console/explore" rel="noopener noreferrer"&gt;vLLM template&lt;/a&gt;, set your environment variables, and your endpoint is live in minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lambdalabs.com/" rel="noopener noreferrer"&gt;Lambda Labs&lt;/a&gt; has competitive hardware pricing, but the managed serving layer is thin and you still own the orchestration. If your workload needs auto-scaling inference with short-lived, per-request billing, Runpod’s Serverless infrastructure handles that out of the box. &lt;a href="https://www.coreweave.com/" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt; targets enterprises with reserved contracts, which is the wrong motion for a seed-stage startup that needs to validate unit economics before committing to reserved capacity.&lt;/p&gt;

&lt;p&gt;Platform selection is the last dial, but it’s not a small one. A well-optimized model stack on the wrong infrastructure still produces the wrong billing curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The optimization sequence&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Start with quantization (AWQ or FP8, depending on your hardware). It’s a one-time change that cuts your VRAM requirements significantly, roughly 75% with 4-bit AWQ or 50% with FP8, and immediately opens up cheaper GPU classes. Then choose your serving engine: SGLang for agentic and structured-output workloads, vLLM for chat and general inference. Add speculative decoding if long-form generation is in your critical path. Monitor everything with &lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt; so you’re reacting to actual bottlenecks rather than guesses.&lt;/p&gt;

&lt;p&gt;Your implementation checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Quantize with AWQ (or FP8 on H100s) using &lt;a href="https://github.com/casper-hansen/AutoAWQ" rel="noopener noreferrer"&gt;AutoAWQ&lt;/a&gt; or a pre-quantized Hugging Face checkpoint&lt;/li&gt;
&lt;li&gt;Choose your engine: &lt;a href="https://github.com/sgl-project/sglang" rel="noopener noreferrer"&gt;SGLang&lt;/a&gt; for agents and JSON output, &lt;a href="https://docs.vllm.ai/en/latest/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; for chat throughput&lt;/li&gt;
&lt;li&gt;Enable speculative decoding on generation-heavy endpoints&lt;/li&gt;
&lt;li&gt;Wire up Prometheus to vllm:gpu_cache_usage_perc before you go to production&lt;/li&gt;
&lt;li&gt;Match your deployment mode to your traffic pattern: Serverless for spiky, Pods for sustained&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A profitable inference endpoint runs on a well-chosen software stack deployed quickly. The hardware matters far less than most teams assume.&lt;/p&gt;

&lt;p&gt;If you’ve run into a different bottleneck or found a combination that works better for your workload, I’d genuinely like to hear it. Drop what you’ve learned in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
    <item>
      <title>Stop Tuning Blind: Query Observability as the Foundation for Database Optimization</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Tue, 24 Mar 2026 11:46:49 +0000</pubDate>
      <link>https://dev.to/damasosanoja/stop-tuning-blind-query-observability-as-the-foundation-for-database-optimization-113p</link>
      <guid>https://dev.to/damasosanoja/stop-tuning-blind-query-observability-as-the-foundation-for-database-optimization-113p</guid>
      <description>&lt;p&gt;A team notices a checkout endpoint slowing down. Response times have crept from 80ms to 900ms over two weeks, but the infrastructure dashboard shows nothing abnormal. So the engineer does what most teams do first: adds an index on the column mentioned in the ticket, deploys, and moves on.&lt;/p&gt;

&lt;p&gt;Two weeks later, the same endpoint is slow again. A different engineer adds another index. Then another. The table now carries 23 indexes. Every &lt;code&gt;INSERT&lt;/code&gt; pays write amplification across all of them. The original slow query is still slow, because the root cause was never the missing index. Stale statistics after a schema migration had triggered a plan regression, and no one caught it because no one was watching &lt;a href="https://www.site24x7.com/what-is-database-monitoring.html" rel="noopener noreferrer"&gt;query-level execution data&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This guide inverts the usual approach. Instead of starting with indexing techniques and treating observability as an afterthought, it starts with the telemetry pipeline: how to capture query-level execution data, correlate it with application traces, and build the feedback loop that makes every subsequent optimization decision measurable. From there, it moves into execution plan analysis, indexing strategies, and resource management, each one grounded in the signals your pipeline surfaces. The principles apply across PostgreSQL, MySQL, and most relational engines. It assumes working knowledge of SQL and basic database administration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instrumenting before you optimize
&lt;/h2&gt;

&lt;p&gt;Database optimization requires three categories of signals, and most teams have at best one of them in place.&lt;/p&gt;

&lt;p&gt;The first is &lt;strong&gt;query execution metrics&lt;/strong&gt;: per-query call count, mean latency, execution time standard deviation, rows scanned versus rows returned, and cache hit ratio. In PostgreSQL, &lt;code&gt;pg_stat_statements&lt;/code&gt; captures these metrics directly, though &lt;a href="https://medium.com/javarevisited/mastering-latency-metrics-p90-p95-p99-d5427faea879" rel="noopener noreferrer"&gt;p99 latency&lt;/a&gt; approximations require &lt;code&gt;pg_stat_monitor&lt;/code&gt; (which provides histogram-based latency distributions) or an external metrics store for precise percentile calculations (&lt;code&gt;stddev_exec_time&lt;/code&gt; is the closest proxy &lt;code&gt;pg_stat_statements&lt;/code&gt; provides). Enable it by adding the extension to &lt;code&gt;shared_preload_libraries&lt;/code&gt;, restarting the server, and creating the extension in each target database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- postgresql.conf (restart required after saving)&lt;/span&gt;
&lt;span class="c1"&gt;-- In managed clouds like AWS RDS or GCP Cloud SQL, enable via Parameter Groups or database flags&lt;/span&gt;
&lt;span class="n"&gt;shared_preload_libraries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pg_stat_statements'&lt;/span&gt;
&lt;span class="c1"&gt;-- pg_stat_statements.track = top   -- default: tracks only top-level statements&lt;/span&gt;
&lt;span class="c1"&gt;-- Set to 'all' if your workload runs queries inside functions or stored procedures&lt;/span&gt;
&lt;span class="c1"&gt;-- After restart, run in each target database&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;pg_stat_statements&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Top consumers by total execution time&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;mean_exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stddev_exec_time&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_statements&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MySQL, the Performance Schema is enabled by default and provides equivalent data. Sort by total time consumed, not worst-case single execution. A query that takes 20ms per call but runs 50,000 times per hour contributes 1,000 seconds of database time, far more than a 5-second query that runs twice a day.&lt;/p&gt;

&lt;p&gt;The second signal is &lt;strong&gt;infrastructure-level database metrics&lt;/strong&gt;: connection counts, operation rates, and table I/O. The &lt;a href="https://opentelemetry.io/docs/collector/" rel="noopener noreferrer"&gt;OpenTelemetry Collector&lt;/a&gt; (&lt;code&gt;otelcol-contrib&lt;/code&gt;, not the core distribution) scrapes these on a configurable interval with no application code changes:&lt;/p&gt;

&lt;p&gt;First, create the monitoring user with the required permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create monitoring user (PostgreSQL 10+)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;otel_monitor&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'your_password'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;pg_monitor&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;otel_monitor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- covers pg_stat_statements, pg_stat_activity, etc.&lt;/span&gt;
&lt;span class="c1"&gt;-- If pg_monitor is unavailable (pre-10), grant individually:&lt;/span&gt;
&lt;span class="c1"&gt;-- GRANT SELECT ON pg_stat_statements TO otel_monitor;&lt;/span&gt;
&lt;span class="c1"&gt;-- GRANT SELECT ON pg_stat_user_tables TO otel_monitor;&lt;/span&gt;
&lt;span class="c1"&gt;-- On AWS RDS and GCP Cloud SQL, pg_monitor is available and the preferred approach.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then configure the collector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgresql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost:5432&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel_monitor&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${env:PG_PASSWORD}&lt;/span&gt;
    &lt;span class="na"&gt;collection_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
    &lt;span class="na"&gt;databases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;myapp_prod&lt;/span&gt;

&lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-backend:4317&lt;/span&gt;

&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;postgresql&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The third signal is &lt;strong&gt;application traces&lt;/strong&gt;. &lt;a href="https://opentelemetry.io/docs/languages/" rel="noopener noreferrer"&gt;Auto-instrumentation libraries&lt;/a&gt; for most languages and database clients (Python and Java have the most mature support; Go and Rust require more manual setup) emit a trace span for every database call, carrying the query text and operation type as span attributes. Without application-level tracing, you can identify slow queries but not which service, endpoint, or user action generated them.&lt;/p&gt;

&lt;p&gt;With all three in place, build a baseline dashboard before changing anything. Run four panels for at least one full business cycle (24 to 48 hours): top queries by total execution time, active connections over time, cache hit ratio, and index scan versus sequential scan ratio per table. Grafana works well for this. The baseline is what you compare against after every optimization. Skip it, and you can't confirm whether a change helped or quantify by how much.&lt;/p&gt;

&lt;p&gt;If assembling this stack in-house isn't the right fit, hosted platforms like &lt;a href="https://www.site24x7.com/database-monitoring.html" rel="noopener noreferrer"&gt;Site24x7&lt;/a&gt; collect the same signal categories across PostgreSQL, MySQL, SQL Server, and RDS/Aurora. The rest of this guide applies regardless of where the telemetry lives.&lt;/p&gt;

&lt;p&gt;The next section uses these signals to read execution plans and identify what needs fixing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading what your telemetry surfaces
&lt;/h2&gt;

&lt;p&gt;Your pipeline is collecting query metrics, infrastructure signals, and application traces. The next step is interpreting what they reveal. Three patterns account for the majority of production database problems, and each one leaves a distinct signature in your telemetry before it becomes a user-facing incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plan regressions
&lt;/h3&gt;

&lt;p&gt;Plan regressions appear as a sudden or gradual increase in execution time for a specific query fingerprint, with no corresponding change in query text. The &lt;a href="https://www.postgresql.org/docs/current/planner-optimizer.html" rel="noopener noreferrer"&gt;query planner makes cost-based decisions&lt;/a&gt; using statistics about row counts and value distributions. When those &lt;a href="https://www.postgresql.org/docs/current/planner-stats.html" rel="noopener noreferrer"&gt;statistics go stale&lt;/a&gt; after a bulk load, a migration, or months of organic growth, the planner's row estimate diverges from reality, and the planner picks a worse access path. Your &lt;code&gt;pg_stat_statements&lt;/code&gt; data will show the regression as a jump in &lt;code&gt;mean_exec_time&lt;/code&gt; for that fingerprint. The execution plan confirms it.&lt;/p&gt;

&lt;p&gt;Running &lt;a href="https://www.postgresql.org/docs/current/sql-explain.html" rel="noopener noreferrer"&gt;&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;&lt;/a&gt; on the offending query produces the actual execution, not just the planner's estimate. Here is what a plan regression looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Output (simplified):&lt;/span&gt;
&lt;span class="c1"&gt;-- Seq Scan on events  (cost=0.00..18450.00 rows=50 width=64)&lt;/span&gt;
&lt;span class="c1"&gt;--                     (actual time=0.042..312.7 rows=180000 loops=1)&lt;/span&gt;
&lt;span class="c1"&gt;--   Filter: (user_id = 42)&lt;/span&gt;
&lt;span class="c1"&gt;--   Rows Removed by Filter: 320000&lt;/span&gt;
&lt;span class="c1"&gt;-- Planning Time: 0.08 ms&lt;/span&gt;
&lt;span class="c1"&gt;-- Execution Time: 458.3 ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The planner estimated 50 rows; the actual count was 180,000, a 3,600x divergence. The &lt;code&gt;Seq Scan&lt;/code&gt; node confirms no index was used, even though one exists on &lt;code&gt;user_id&lt;/code&gt;. The &lt;code&gt;Rows Removed by Filter&lt;/code&gt; line shows 320,000 rows were read and discarded. Refreshing statistics manually after large data changes is standard practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: refresh statistics for a specific table&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- MySQL: equivalent command&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running &lt;code&gt;ANALYZE&lt;/code&gt;, re-execute the &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;. If the row estimate now matches reality and the planner switches to an index scan, stale statistics were the root cause.&lt;/p&gt;

&lt;p&gt;Stale statistics are the most common trigger, but plan regressions can also surface through changes in join strategy or CTE materialization. &lt;a href="https://use-the-index-luke.com/sql/join/nested-loops-join-n1-problem" rel="noopener noreferrer"&gt;Nested loop joins&lt;/a&gt; are efficient when one side is small and indexed; &lt;a href="https://www.postgresql.org/docs/current/planner-optimizer.html" rel="noopener noreferrer"&gt;hash joins handle larger unindexed sets, and merge joins work best on pre-sorted input&lt;/a&gt;. When the planner switches strategy between deploys your execution plan will show the new join node and your &lt;code&gt;pg_stat_statements&lt;/code&gt; data will show the performance delta. The same diagnostic applies: compare estimated versus actual rows and check whether stale statistics or data growth changed the cost calculation.&lt;/p&gt;

&lt;p&gt;A related case is Common Table Expression materialization. In PostgreSQL 12 and later, &lt;a href="https://www.postgresql.org/docs/current/queries-with.html" rel="noopener noreferrer"&gt;CTEs are inlined by default&lt;/a&gt; if they are non-recursive, referenced only once, and free of side-effects. In PostgreSQL 11 and earlier, &lt;a href="https://www.enterprisedb.com/blog/postgresqls-ctes-are-optimisation-fences" rel="noopener noreferrer"&gt;all CTEs are materialized as optimization fences&lt;/a&gt;, preventing predicate pushdown into the CTE body. When a CTE is referenced multiple times, PostgreSQL still materializes it to avoid duplicate computation unless you explicitly specify &lt;code&gt;NOT MATERIALIZED&lt;/code&gt;. If your telemetry shows a query &lt;a href="https://hakibenita.com/be-careful-with-cte-in-postgre-sql" rel="noopener noreferrer"&gt;scanning far more rows than expected through a CTE&lt;/a&gt;, check whether materialization is forcing a full scan where a filtered one would suffice. The first diagnostic question is whether the CTE executes once per query or once per row in a join.&lt;/p&gt;

&lt;h3&gt;
  
  
  Contention
&lt;/h3&gt;

&lt;p&gt;Contention shows a different signature. Instead of one query getting slower, many connections wait on the same resource simultaneously. A &lt;code&gt;SHOW PROCESSLIST&lt;/code&gt; (MySQL) or &lt;code&gt;SELECT * FROM pg_stat_activity&lt;/code&gt; (PostgreSQL) &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html" rel="noopener noreferrer"&gt;during the incident&lt;/a&gt; might show 140 connections blocked on a table-level lock held by a single long-running transaction.&lt;/p&gt;

&lt;p&gt;Your telemetry surfaces this pattern through execution time variance. The same query fingerprint alternates between 5ms and 4 seconds depending on whether it hits the lock window, producing a high &lt;code&gt;stddev_exec_time&lt;/code&gt; relative to &lt;code&gt;mean_exec_time&lt;/code&gt; in &lt;code&gt;pg_stat_statements&lt;/code&gt;. When you see that ratio spike, investigate lock waits before assuming a plan problem. Contention-driven variance affects multiple unrelated fingerprints at the same time; if only a single fingerprint shows high stddev, the cause is more likely an inherently variable workload than a locking issue.&lt;/p&gt;

&lt;p&gt;To identify the blocking session, use &lt;code&gt;pg_blocking_pids()&lt;/code&gt; (PostgreSQL 9.6+):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find blocking sessions and what they are running&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;blocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;blocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocked_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;blocking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocking_pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;blocking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;blocking_query&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt; &lt;span class="n"&gt;blocked&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt; &lt;span class="n"&gt;blocking&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;blocking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_blocking_pids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;cardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_blocking_pids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MySQL equivalent joins &lt;code&gt;performance_schema.data_lock_waits&lt;/code&gt; with &lt;code&gt;performance_schema.threads&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maintenance drift
&lt;/h3&gt;

&lt;p&gt;Maintenance drift is the slowest-moving pattern, and the hardest to notice because no single event triggers it. Over weeks and months, dead index entries accumulate from row updates and deletes, &lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html" rel="noopener noreferrer"&gt;statistics go stale&lt;/a&gt; as migrations reshape data distributions, and indexes that once matched hot access patterns quietly fall out of alignment with what the application actually queries. None of this shows up on a standard infrastructure dashboard.&lt;/p&gt;

&lt;p&gt;What your telemetry &lt;em&gt;does&lt;/em&gt; surface is a gradual increase in the rows-scanned-to-rows-returned ratio across multiple query fingerprints, often paired with a declining cache hit ratio. When a query scans 200,000 rows to return 40, the planner is telling you it can't satisfy that predicate with any existing index. A partial or expression index often closes the gap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diagnostic triage: from signal to action
&lt;/h3&gt;

&lt;p&gt;The following decision tree maps each telemetry pattern to its diagnostic path and the section that addresses the fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A["Telemetry signal detected"] --&amp;gt; B{"Signal pattern?"}
    B --&amp;gt;|"mean_exec_time jump,&amp;lt;br&amp;gt;single fingerprint"| C["Plan regression"]
    B --&amp;gt;|"High stddev_exec_time,&amp;lt;br&amp;gt;multiple fingerprints"| D["Contention"]
    B --&amp;gt;|"Gradual scan ratio rise,&amp;lt;br&amp;gt;cache hit ratio decline"| E["Maintenance drift"]
    C --&amp;gt; F["EXPLAIN ANALYZE: compare&amp;lt;br&amp;gt;estimated vs. actual rows"]
    F --&amp;gt;|"Stale statistics"| G["ANALYZE table, re-check plan"]
    F --&amp;gt;|"Wrong access path"| H["See: Indexing decisions"]
    D --&amp;gt; I["pg_stat_activity /&amp;lt;br&amp;gt;SHOW PROCESSLIST"]
    I --&amp;gt;|"Connection saturation"| J["See: Connection pooling"]
    I --&amp;gt;|"Single lock holder"| K["Identify blocking transaction"]
    E --&amp;gt; L["pgstatindex for bloat /&amp;lt;br&amp;gt;table size for growth"]
    L --&amp;gt;|"Index bloat"| M["See: Index maintenance"]
    L --&amp;gt;|"Unbounded table growth"| N["See: Table partitioning"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you know which queries need attention and why the planner chose poorly, the next question is what structural change fixes it. Indexing decisions, grounded in the signals your telemetry just surfaced, are where that answer starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Indexing decisions driven by what the data shows
&lt;/h2&gt;

&lt;p&gt;The next step is the structural change that fixes what the planner got wrong. Indexing is the most common response to a slow query, and the most commonly misconfigured one. A well-chosen index can cut execution time by orders of magnitude; a poorly chosen one adds write overhead with no measurable read benefit. The difference depends on matching the index design to what your signals actually showed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Composite index column ordering
&lt;/h3&gt;

&lt;p&gt;An Index Scan in the execution plan does not guarantee efficiency. If the planner is still reading far more rows than it returns, the index exists but its &lt;a href="https://use-the-index-luke.com/sql/where-clause/the-equals-sign/concatenated-keys" rel="noopener noreferrer"&gt;column order doesn't match the query's predicate structure&lt;/a&gt;. The general rule for &lt;a href="https://www.postgresql.org/docs/current/indexes-multicolumn.html" rel="noopener noreferrer"&gt;multi-column indexes&lt;/a&gt;: equality predicates go first, then sorting columns (for &lt;code&gt;ORDER BY&lt;/code&gt; or &lt;code&gt;GROUP BY&lt;/code&gt;), and range predicates go last.&lt;/p&gt;

&lt;p&gt;Consider a query filtering by &lt;code&gt;user_id&lt;/code&gt; and ranging on &lt;code&gt;created_at&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Suboptimal: range predicate on the leading column&lt;/span&gt;
&lt;span class="c1"&gt;-- The index can only be used for the created_at range;&lt;/span&gt;
&lt;span class="c1"&gt;-- user_id filtering happens after the scan, not during it&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_events_ts_user&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Correct: equality first, range last&lt;/span&gt;
&lt;span class="c1"&gt;-- The index narrows to all rows for user 42, then scans only the timestamp range&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_events_user_ts&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Putting &lt;code&gt;user_id&lt;/code&gt; first collapses the initial scan to a single user's rows before the range scan begins. The same principle extends to sorting: placing a range predicate &lt;em&gt;before&lt;/em&gt; the sort column can &lt;a href="https://use-the-index-luke.com/sql/sorting-grouping/index-for-sorting" rel="noopener noreferrer"&gt;force an expensive in-memory sort&lt;/a&gt; instead of using the index's native ordering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partial (filtered) indexes
&lt;/h3&gt;

&lt;p&gt;When the scan ratio is high only for queries targeting a narrow subset, like the few thousand pending rows in a million-row job queue, a full index wastes I/O on rows those queries never touch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Only index rows where work still needs to happen&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_jobs_pending&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting index is orders of magnitude smaller than the full alternative. Because &lt;a href="https://www.postgresql.org/docs/current/indexes-partial.html" rel="noopener noreferrer"&gt;the query planner recognizes the predicate&lt;/a&gt;, it uses the partial index directly for queries that include &lt;code&gt;WHERE status = 'pending'&lt;/code&gt;. The trade-off is specificity: if your application queries other status values with similar frequency, you'll need separate partial indexes or a full one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expression (functional) indexes
&lt;/h3&gt;

&lt;p&gt;Sometimes the predicate itself is the problem. When a query filters on a transformed column like &lt;code&gt;LOWER(email)&lt;/code&gt;, a standard B-tree index on the raw column is useless because the planner cannot match the transformation to the stored index entries. An &lt;a href="https://www.postgresql.org/docs/current/indexes-expressional.html" rel="noopener noreferrer"&gt;expression index&lt;/a&gt; indexes the output of the function, not the column itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Case-insensitive email lookup&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_email_lower&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'user@example.com'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- JSON field extraction&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_events_payload_type&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'event_type'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'event_type'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'checkout'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query predicate must match the indexed expression exactly. &lt;code&gt;WHERE LOWER(email) = '...'&lt;/code&gt; hits &lt;code&gt;idx_users_email_lower&lt;/code&gt;; while &lt;code&gt;WHERE email ILIKE '...'&lt;/code&gt; does not, because the planner treats them as distinct operations. MySQL supports expression indexes from version 8.0 with the same identity requirement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Covering indexes
&lt;/h3&gt;

&lt;p&gt;The heap fetch is one of the most under valued performance bottlenecks. Even when the planner picks the right index and row estimates are accurate, each index hit triggers a random I/O back to the table to retrieve columns not stored in the index. A &lt;a href="https://www.postgresql.org/docs/current/indexes-index-only-scans.html" rel="noopener noreferrer"&gt;covering index&lt;/a&gt; eliminates that secondary lookup by including every column the query needs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Hot path query on a multi-tenant SaaS table&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Covering index satisfies the full query from the index alone&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_tenant_active&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;INCLUDE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/current/indexes-index-only-scans.html" rel="noopener noreferrer"&gt;&lt;code&gt;INCLUDE&lt;/code&gt; clause&lt;/a&gt; attaches non-key columns to the index leaf pages without affecting the B-tree structure. &lt;a href="https://www.postgresql.org/docs/current/indexes-index-only-scans.html" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; and &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns" rel="noopener noreferrer"&gt;SQL Server&lt;/a&gt; support it directly. &lt;a href="https://dev.mysql.com/doc/refman/9.3/en/innodb-index-types.html" rel="noopener noreferrer"&gt;MySQL (InnoDB)&lt;/a&gt; has no &lt;code&gt;INCLUDE&lt;/code&gt; keyword, but every secondary index already carries the Primary Key at its leaf nodes, so you achieve the same effect by appending the extra columns to a &lt;a href="https://dev.mysql.com/doc/en/create-index.html" rel="noopener noreferrer"&gt;standard index definition&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The payoff is most pronounced on frequently executed queries where the heap fetch accounts for a measurable share of execution time. The cost is a larger index and added write overhead per row change, so covering indexes make sense for critical hot paths, not general use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Index bloat and maintenance
&lt;/h3&gt;

&lt;p&gt;Your telemetry shows a pattern consistent with maintenance drift: cache hit ratio declining gradually, scan times rising across multiple query fingerprints with no corresponding change in query text or data volume. Dead index entries from row updates and deletes are a common cause. In PostgreSQL, the &lt;a href="https://www.postgresql.org/docs/current/pgstattuple.html" rel="noopener noreferrer"&gt;&lt;code&gt;pgstattuple&lt;/code&gt; extension&lt;/a&gt; provides the &lt;code&gt;pgstatindex&lt;/code&gt; function to measure B-tree bloat directly via page density:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Install the extension once per database (required before pgstatindex is available)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;pgstattuple&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pgstatindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'idx_events_user_ts'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- avg_leaf_density dropping significantly below its baseline is a signal worth investigating;&lt;/span&gt;
&lt;span class="c1"&gt;-- no single universal threshold applies, but sustained readings below ~70% are a commonly&lt;/span&gt;
&lt;span class="c1"&gt;-- cited starting point; treat it as a prompt to investigate trends, not a hard trigger&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When bloat reaches the point where rebuilds are warranted, most engines can do it online. PostgreSQL offers &lt;a href="https://www.postgresql.org/docs/current/sql-reindex.html" rel="noopener noreferrer"&gt;&lt;code&gt;REINDEX CONCURRENTLY&lt;/code&gt;&lt;/a&gt; (available since PostgreSQL 12); MySQL's InnoDB rebuilds indexes in-place via &lt;a href="https://dev.mysql.com/doc/refman/9.3/en/innodb-online-ddl-operations.html" rel="noopener noreferrer"&gt;&lt;code&gt;ALTER TABLE ... FORCE&lt;/code&gt;&lt;/a&gt; or &lt;a href="https://dev.mysql.com/doc/refman/9.3/en/optimize-table.html" rel="noopener noreferrer"&gt;&lt;code&gt;OPTIMIZE TABLE&lt;/code&gt;&lt;/a&gt;. How often you need to rebuild depends on write volume.&lt;/p&gt;

&lt;p&gt;Both engines include automatic maintenance, but the defaults assume moderate write loads. PostgreSQL's &lt;a href="https://www.postgresql.org/docs/current/routine-autovacuum.html" rel="noopener noreferrer"&gt;autovacuum&lt;/a&gt; fires when the fraction of dead rows in a table crosses &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt;, which defaults to 0.2 (20%). For a 1,000-row lookup table, that threshold is fine. For a 10-million-row events table, it means 2 million dead rows can accumulate before cleanup begins. MySQL's InnoDB purge thread handles dead-row cleanup continuously, but under heavy update workloads the purge lag (&lt;code&gt;History list length&lt;/code&gt; in &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt;) can grow faster than the thread drains it, producing similar bloat symptoms.&lt;/p&gt;

&lt;p&gt;In PostgreSQL, you can identify tables where autovacuum is falling behind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Identify tables where autovacuum is not keeping up&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_dead_tup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_live_tup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_autovacuum&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_user_tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;n_dead_tup&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;n_dead_tup&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Override autovacuum threshold for a specific high-churn table (no restart required)&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autovacuum_vacuum_scale_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Now autovacuum fires after 1% dead rows instead of 20%&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Unused index audit
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/indexes-intro.html" rel="noopener noreferrer"&gt;Every index adds overhead to every write operation&lt;/a&gt; and the overhead compounds silently. The intro scenario's 23-index table is an extreme case, but smaller versions of the same problem are common. Auditing for indexes your query workload never uses is as important as adding new ones. In PostgreSQL, &lt;a href="https://www.postgresql.org/docs/current/monitoring-stats.html" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_user_indexes&lt;/code&gt;&lt;/a&gt; exposes &lt;code&gt;idx_scan&lt;/code&gt; counts per index.&lt;/p&gt;

&lt;p&gt;Any index with zero or near-zero scans after weeks of production traffic is a candidate for removal, with two caveats. First, make sure the index isn't enforcing a &lt;code&gt;UNIQUE&lt;/code&gt; constraint or Primary Key, since these do critical work enforcing data integrity on every write, even if never explicitly scanned by a &lt;code&gt;SELECT&lt;/code&gt;. Second, make sure your observation window doesn't miss heavy seasonal queries, such as end-of-month reporting or quarterly rollups.&lt;/p&gt;

&lt;p&gt;Indexing addresses the query path. The next layer is the infrastructure around it: connection management, data layout, and write throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing the infrastructure on which your queries run
&lt;/h2&gt;

&lt;p&gt;Indexing optimized the query path. Three infrastructure-level bottlenecks can negate those gains: connection exhaustion under load, scan costs that grow with table size despite correct indexes, and write latency amplified by row-at-a-time inserts. Each surface in your telemetry before it becomes a production incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection pooling and routing
&lt;/h3&gt;

&lt;p&gt;The contention pattern from the previous sections, where 140 connections were blocked on a table-level lock, often starts as a connection management problem. Most relational databases carry overhead per connection: process or thread creation, memory allocation, and authentication. In PostgreSQL, idle connections share most memory pages with the parent process via Copy-on-Write, but actual overhead ranges from under 2 MB (with huge pages and minimal prior activity) to over 10 MB, depending on &lt;code&gt;shared_buffers&lt;/code&gt; size and prior query activity. Active connections cost far more: &lt;code&gt;work_mem&lt;/code&gt; is allocated per sort or hash node in the query plan (default 4 MB each), so a complex query with multiple such nodes can consume a multiple of that figure. Connection poolers like &lt;a href="https://www.pgbouncer.org/" rel="noopener noreferrer"&gt;PgBouncer&lt;/a&gt; (PostgreSQL) and &lt;a href="https://proxysql.com/" rel="noopener noreferrer"&gt;ProxySQL&lt;/a&gt; (MySQL and PostgreSQL) multiplex many application connections onto a smaller pool of database connections.&lt;/p&gt;

&lt;p&gt;The architectural decision is the pooling mode. Session mode maps each application connection to a dedicated database connection for its lifetime, preserving session state (prepared statements, advisory locks). Transaction mode returns connections to the pool after each commit, enabling higher concurrency, but breaks any session-scoped feature. Audit your application's session-level usage before migrating modes. For read-heavy workloads with replicas, ProxySQL can route &lt;code&gt;SELECT&lt;/code&gt; queries to replicas and writes to the primary at the proxy layer. The trade-off is replication lag: reads immediately after writes may not reflect the latest state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table partitioning
&lt;/h3&gt;

&lt;p&gt;Your telemetry shows correct index usage, the planner picks the right index, row estimates are accurate, but execution time still grows month over month. The table itself is growing, and even a good index scan takes longer when the underlying B-tree is larger. &lt;a href="https://www.postgresql.org/docs/current/ddl-partitioning.html" rel="noopener noreferrer"&gt;Range partitioning on a timestamp column&lt;/a&gt; addresses this by enabling partition pruning: when a query includes a predicate on the partition key, the database scans only the relevant partitions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Parent table: partitioned by month on created_at&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;         &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;    &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt;     &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- One child partition per month&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events_2025_01&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
    &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-02-01'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A query filtering to the last 30 days on a table partitioned by month typically scans 2 partitions rather than the full table. The execution plan confirms pruning via a &lt;code&gt;Partitions&lt;/code&gt; field or equivalent. Teams typically automate partition maintenance (creating future partitions in advance and detaching old ones) with &lt;a href="https://github.com/pgpartman/pg_partman" rel="noopener noreferrer"&gt;&lt;code&gt;pg_partman&lt;/code&gt;&lt;/a&gt;, a PostgreSQL extension that manages partition creation and retention on a configurable schedule. Without this automation, &lt;code&gt;INSERT&lt;/code&gt; statements targeting a date range with no corresponding partition will fail at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch write throughput
&lt;/h3&gt;

&lt;p&gt;Row-at-a-time inserts pay two costs per statement: a network round trip to the server and index maintenance across every index on the table. Batching rows into a single &lt;code&gt;INSERT&lt;/code&gt; pays both costs once per statement instead of once per row. Hundreds to thousands of rows per statement typically deliver 10 to 20x throughput improvement on bulk loads, depending on row width and network latency.&lt;/p&gt;

&lt;p&gt;Each engine imposes a ceiling on batch size. SQL Server caps parameterized queries at 2,100 parameters. MySQL's &lt;code&gt;max_allowed_packet&lt;/code&gt; rejects oversized payloads and closes the connection entirely; check the current limit with &lt;code&gt;SHOW VARIABLES LIKE 'max_allowed_packet'&lt;/code&gt; and increase it globally in &lt;code&gt;my.cnf&lt;/code&gt; or via &lt;code&gt;SET GLOBAL max_allowed_packet = 134217728&lt;/code&gt; (existing connections pick up the new default on reconnection). PostgreSQL's extended query protocol caps any single parameterized statement at 65,535 bind parameters. In practice, chunking into batches of 1,000 to 5,000 rows is the sweet spot across all three engines.&lt;/p&gt;

&lt;p&gt;With the query path and infrastructure tuned, the remaining question is where automation can reduce the ongoing maintenance burden.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automating optimization and anomaly detection
&lt;/h2&gt;

&lt;p&gt;The telemetry pipeline, execution plan analysis, indexing strategy, and infrastructure tuning covered so far are manual disciplines. Each requires an engineer to interpret signals and decide on a change. Two categories of automation can reduce that burden without replacing the judgment behind it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workload-aware index recommendations
&lt;/h3&gt;

&lt;p&gt;Tools like &lt;a href="https://www.eversql.com/" rel="noopener noreferrer"&gt;EverSQL&lt;/a&gt; ingest production query logs or slow query exports, build a workload model from query fingerprints, simulate execution plans, and generate index recommendations ranked by estimated improvement. Some also suggest query rewrites. The value is prioritization: instead of manually reviewing &lt;code&gt;pg_stat_statements&lt;/code&gt; output to decide which query to optimize first, the tool ranks candidates by aggregate impact and proposes a specific structural change. But no recommendation should go straight to production. Treat these recommendations as a starting point, not a deployment-ready output. Check whether the recommended index covers a write-heavy table, since read performance gains come at the cost of write amplification across every &lt;code&gt;INSERT&lt;/code&gt; and &lt;code&gt;UPDATE&lt;/code&gt;. Confirm that any rewritten query produces identical results under edge-case data distributions, not just the common case the tool optimized for. &lt;/p&gt;

&lt;h3&gt;
  
  
  Anomaly detection on query metrics
&lt;/h3&gt;

&lt;p&gt;ML-based anomaly detection on time-series query execution metrics can flag plan regressions post-deployment without requiring manual baseline comparison. This addresses the intro scenario directly: the checkout endpoint's latency crept from 80ms to 900ms over two weeks, with no alert firing because no static threshold was breached. An anomaly detector trained on per-fingerprint latency distributions would flag a 10x deviation from the rolling baseline within hours, not weeks.&lt;/p&gt;

&lt;p&gt;This is more useful than static thresholds because it adapts to traffic patterns. A query that naturally runs slower during batch jobs at 2 AM shouldn't generate a 3 AM alert. However, effective anomaly detection requires long-term retention of per-fingerprint query metrics. You can build that on your database's built-in statistics views, on the external metrics store your OTel pipeline already feeds, or delegate it to a hosted tool with anomaly detection built in, such as &lt;a href="https://www.site24x7.com/database-monitoring.html" rel="noopener noreferrer"&gt;ManageEngine's database monitoring&lt;/a&gt;. The trade-off is where the telemetry sits and who retains it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed database automation
&lt;/h3&gt;

&lt;p&gt;Cloud-managed databases increasingly bundle automatic index recommendations (&lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/database/automatic-tuning-overview" rel="noopener noreferrer"&gt;Azure SQL Database&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html" rel="noopener noreferrer"&gt;Amazon RDS Performance Insights&lt;/a&gt;) and compute auto-scaling. These autonomous features reduce operational overhead but operate within the bounds set by schema structure and access patterns, both of which require human decisions upstream. They handle the maintenance loop. They don't replace the diagnostic skill of reading an execution plan or the architectural judgment of choosing a partitioning strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a measurable feedback cycle
&lt;/h2&gt;

&lt;p&gt;Whether you automate parts of the maintenance cycle or handle every step manually, the principle is the same: every optimization needs a closed feedback loop to prove it worked.&lt;/p&gt;

&lt;p&gt;With the pipeline described in this guide, the opening scenario plays out differently. The &lt;code&gt;pg_stat_statements&lt;/code&gt; baseline catches the &lt;code&gt;mean_exec_time&lt;/code&gt; regression within a day. The &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; output reveals a 3,600x row estimate divergence, pointing to stale statistics after the schema migration. Running &lt;code&gt;ANALYZE&lt;/code&gt; on the affected table restores the correct execution plan. The unused index audit flags 19 of those 23 indexes as candidates for removal. The baseline dashboard confirms the fix: execution time drops, write throughput recovers, and the next regression, whenever it arrives, will surface in the same pipeline before a user files a ticket.&lt;/p&gt;

&lt;p&gt;The underlying shift is structural: from reacting to symptoms toward building a system that surfaces causes. Query-level telemetry provides the signals. Execution plan analysis reveals what the planner decided and whether it decided well. From there, indexing and infrastructure changes become the levers, and the baseline dashboard closes the loop by confirming whether pulling a lever worked. Each piece feeds the next.&lt;/p&gt;

&lt;p&gt;Database optimization is not a one-time project. It's a feedback loop. The teams that maintain fast, reliable databases over time are not the ones with the best indexing intuition. They're the ones whose instrumentation tells them where to look next. Start with &lt;code&gt;pg_stat_statements&lt;/code&gt; or Performance Schema, build the four-panel baseline, and let the data show you where your first optimization should land.&lt;/p&gt;

</description>
      <category>database</category>
      <category>monitoring</category>
      <category>performance</category>
      <category>sql</category>
    </item>
    <item>
      <title>Beyond Basic Indexes: Advanced Postgres Indexing for Maximum Supabase Performance</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Mon, 29 Sep 2025 11:12:18 +0000</pubDate>
      <link>https://dev.to/damasosanoja/beyond-basic-indexes-advanced-postgres-indexing-for-maximum-supabase-performance-3oj1</link>
      <guid>https://dev.to/damasosanoja/beyond-basic-indexes-advanced-postgres-indexing-for-maximum-supabase-performance-3oj1</guid>
      <description>&lt;p&gt;My Supabase application started with lightning-fast queries and smooth user interactions. Database operations felt instant, dashboards loaded in milliseconds, and search features responded immediately. Then reality hit: with tens of thousands of users and millions of rows, those same queries now took seconds to complete. That means user complaints and infrastructure costs.&lt;/p&gt;

&lt;p&gt;I wasn't facing a scaling issue - &lt;em&gt;I was experiencing a gap between my application's evolving complexity and my database's indexing strategy.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;While &lt;a href="https://supabase.com/docs/guides/database/postgres/indexes" rel="noopener noreferrer"&gt;basic B-tree indexes&lt;/a&gt; efficiently handle simple equality and range queries, they become performance liabilities when applications evolve beyond straightforward patterns. My app needed to handle &lt;a href="https://supabase.com/docs/guides/database/json" rel="noopener noreferrer"&gt;&lt;code&gt;jsonb&lt;/code&gt;&lt;/a&gt; document searches, array operations, function-based queries, and targeted filtering.&lt;/p&gt;

&lt;p&gt;Advanced Postgres indexing strategies—specifically &lt;a href="https://supabase.com/docs/guides/database/postgres/indexes" rel="noopener noreferrer"&gt;expression and partial indexes&lt;/a&gt;—transformed these performance bottlenecks into optimized operations. I also discovered specialized techniques like &lt;a href="https://www.postgresql.org/docs/current/gin.html" rel="noopener noreferrer"&gt;GIN (Generalized Inverted Index)&lt;/a&gt;, &lt;a href="https://supabase.com/docs/guides/database/extensions/postgis" rel="noopener noreferrer"&gt;GiST (Generalized Search Tree)&lt;/a&gt;, and &lt;a href="https://supabase.com/docs/guides/ai/vector-indexes/hnsw-indexes" rel="noopener noreferrer"&gt;HNSW (Hierarchical Navigable Small World)&lt;/a&gt; indexes for complex data types.&lt;/p&gt;

&lt;p&gt;Here are the strategies I used, with real-world examples and performance analysis that helped me maintain peak performance as my application scaled.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Supabase Uses Postgres's Native Indexing Capabilities
&lt;/h2&gt;

&lt;p&gt;Supabase's &lt;a href="https://supabase.com/docs/guides/database/extensions/index_advisor" rel="noopener noreferrer"&gt;Index Advisor&lt;/a&gt; efficiently identifies B-tree optimization opportunities, &lt;a href="https://supabase.com/docs/guides/database/extensions/pg_stat_statements" rel="noopener noreferrer"&gt;&lt;code&gt;pg_stat_statements&lt;/code&gt;&lt;/a&gt; reveals resource-hungry queries, and &lt;a href="https://supabase.com/docs/guides/database/extensions" rel="noopener noreferrer"&gt;additional database extensions can be enabled&lt;/a&gt; for advanced indexing scenarios.&lt;/p&gt;

&lt;p&gt;The performance challenge arises with the increasing complexity of modern application data patterns. &lt;code&gt;jsonb&lt;/code&gt; document queries, array-containment operations, full-text search, and geospatial lookups are sophisticated use cases that require equally sophisticated indexing strategies. No automated tool can fully solve these scenarios because they demand a contextual understanding of your specific data patterns, query frequency, and performance requirements.&lt;/p&gt;

&lt;p&gt;While Supabase provides tooling to identify optimization opportunities, there's a fundamental limitation that automated tools can't address—the default indexing approach that works for simple queries often breaks down completely with these complex operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Your B-Tree Indexes Are Failing Your Users (Original: Why Basic Indexes Aren't Enough)
&lt;/h2&gt;

&lt;p&gt;The core issue isn't your indexing strategy—it's that &lt;em&gt;B-tree indexes simply cannot handle the query patterns your users actually need.&lt;/em&gt; While B-trees excel at simple equality and range operations, they become performance liabilities when applications require complex data operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your performance bottlenecks are hiding in these common patterns:&lt;/strong&gt; &lt;code&gt;jsonb&lt;/code&gt; document queries represent the most severe blind spot. This user preference lookup appears innocent but triggers sequential scans on even moderately sized tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_profiles&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;preferences&lt;/span&gt; &lt;span class="o"&gt;@&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'{"notifications": true, "theme": "dark"}'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without a proper index on the &lt;code&gt;jsonb&lt;/code&gt; column, this query scales terribly—for instance, what executes in fifty milliseconds with ten thousand users could become a three-second operation with one hundred thousand users.&lt;/p&gt;

&lt;p&gt;Array operations suffer similarly. This product search query forces expensive table scans despite having a price index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'electronics'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'mobile'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The array overlap operator (&lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;) cannot utilize B-tree indexes, forcing Postgres to examine every row individually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The diagnostic evidence is already in your database:&lt;/strong&gt; Supabase's &lt;code&gt;pg_stat_statements&lt;/code&gt; extension reveals the issue through queries with high &lt;code&gt;total_exec_time&lt;/code&gt; and &lt;code&gt;shared_blks_read&lt;/code&gt; values, which indicate sequential scans where indexes should apply. These metrics don't lie—if your complex queries show massive block reads, you're hitting the B-tree ceiling.&lt;/p&gt;

&lt;p&gt;Consider this full-text search pattern becoming common as applications mature:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;websearch_to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user search terms'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without proper indexing for full-text search, query times could increase exponentially with document count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cost isn't just slow queries:&lt;/strong&gt; Each inefficient query consumes excessive CPU and memory, reducing concurrent capacity. Users abandon slow searches, support tickets multiply, and infrastructure costs spiral as you throw hardware at software problems. Your Supabase application can handle complex data efficiently—but only if you escape B-tree limitations and implement the advanced indexing strategies your data patterns demand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Expression Indexes: Optimizing Function-Based Queries
&lt;/h2&gt;

&lt;p&gt;Expression indexes solve the critical performance gap between how your application queries data and how Postgres can efficiently access it. When queries consistently apply functions or transformations to column values—such as case-insensitive comparisons, date extractions, or calculated fields—Postgres cannot utilize standard B-tree indexes because the index stores raw column values, not computed results.&lt;/p&gt;

&lt;p&gt;This diagram illustrates how expression indexes work, transforming inconsistent source data to a normalized index structure and enabling efficient queries on computed values rather than raw data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Table&lt;/span&gt; &lt;span class="k"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;           &lt;span class="n"&gt;Expression&lt;/span&gt; &lt;span class="k"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;Query&lt;/span&gt; &lt;span class="n"&gt;Optimization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;email&lt;/span&gt;                 &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;            
&lt;span class="nv"&gt;"John@EXAMPLE.com"&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;"john@example.com"&lt;/span&gt; &lt;span class="err"&gt;───┐&lt;/span&gt;  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john@example.com'&lt;/span&gt;
&lt;span class="nv"&gt;"mary@TEST.org"&lt;/span&gt;    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;"mary@test.org"&lt;/span&gt;    &lt;span class="err"&gt;───┼─&lt;/span&gt; &lt;span class="n"&gt;Fast&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;lookup&lt;/span&gt; &lt;span class="k"&gt;instead&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt;
&lt;span class="nv"&gt;"Bob@demo.NET"&lt;/span&gt;     &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;"bob@demo.net"&lt;/span&gt;     &lt;span class="err"&gt;───┘&lt;/span&gt;  &lt;span class="n"&gt;scanning&lt;/span&gt; &lt;span class="n"&gt;entire&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This scenario commonly occurs when email contacts are imported into your database from external sources with inconsistent casing. While forcing lowercase storage during import with lowercase comparison would be a cleaner and more efficient approach, expression indexes provide a powerful solution when you need to work with existing inconsistent data or when data normalization isn't feasible.&lt;/p&gt;

&lt;p&gt;Now that you understand what expression indexes accomplish, let's examine the technical mechanism that makes this optimization possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Precomputing for Performance
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rovlt9o5u1xtizjvrqm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rovlt9o5u1xtizjvrqm.png" alt="Expression indexes" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Expression indexes work by precomputing and storing the results of specified functions or expressions during index creation. When Postgres encounters a query with a &lt;code&gt;WHERE&lt;/code&gt; clause that exactly matches the indexed expression, it can use this precomputed index for lightning-fast lookups instead of applying the function to every row during a sequential scan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Problem: This query forces a sequential scan on every row&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john@example.com'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Solution: Create an expression index on the lowercased email&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_lower_email&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;-- Now this query uses the index for millisecond performance&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john@example.com'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Identifying Expression Index Candidates in Supabase
&lt;/h3&gt;

&lt;p&gt;Your &lt;code&gt;pg_stat_statements&lt;/code&gt; data reveals queries with high execution times that consistently apply functions in &lt;code&gt;WHERE&lt;/code&gt; clauses. Look for patterns involving &lt;code&gt;LOWER()&lt;/code&gt;, &lt;code&gt;UPPER()&lt;/code&gt;, date functions like &lt;code&gt;EXTRACT()&lt;/code&gt;, mathematical calculations, or &lt;code&gt;jsonb&lt;/code&gt; path extractions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- High-impact candidate: User search by normalized phone numbers&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_normalized_phone&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;REGEXP_REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[^0-9]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'g'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Optimizes queries like:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;REGEXP_REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[^0-9]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'g'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'1234567890'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Critical Implementation Requirements
&lt;/h3&gt;

&lt;p&gt;Expression indexes demand &lt;em&gt;immutable&lt;/em&gt; functions—those guaranteed to return identical results for identical inputs without side effects. Postgres enforces this restriction to maintain index consistency. Here's how it works in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Valid: Date extraction from timestamps&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_year&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;-- Optimizes year-based reporting queries&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2024&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Invalid: NOW() is not immutable (changes over time)&lt;/span&gt;
&lt;span class="c1"&gt;-- CREATE INDEX invalid_idx ON events (created_at - NOW()); -- This fails&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  jsonb Path Extraction for Supabase Applications
&lt;/h3&gt;

&lt;p&gt;For applications storing flexible data structures in &lt;code&gt;jsonb&lt;/code&gt; columns, expression indexes on frequently accessed paths provide dramatic performance improvements for equality and range queries. The following example demonstrates two common patterns for optimizing &lt;code&gt;jsonb&lt;/code&gt; queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- SaaS application: Index user preference values&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_user_preferences_theme&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;user_profiles&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;preferences&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'theme'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Fast lookups for users with specific preferences&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_profiles&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preferences&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'theme'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'dark'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- E-commerce: Index calculated discount percentages&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_products_discount_rate&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="n"&gt;original_price&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sale_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;original_price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sale_price&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;original_price&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expression indexes transform function-heavy queries from performance bottlenecks into optimized operations, but they require careful consideration of write overhead and exact query matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Partial Indexes: Targeting Specific Data Subsets
&lt;/h2&gt;

&lt;p&gt;Partial indexes represent a surgical approach to database optimization, addressing the fundamental inefficiency of indexing data you rarely query. By including only rows that satisfy a specific &lt;code&gt;WHERE&lt;/code&gt; condition in the index, partial indexes deliver dramatically smaller index sizes, reduced maintenance overhead, and laser-focused performance for your most critical query patterns.&lt;/p&gt;

&lt;p&gt;This diagram illustrates the dramatic size reduction when only specific rows are indexed based on query patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Full&lt;/span&gt; &lt;span class="k"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;                    &lt;span class="k"&gt;Partial&lt;/span&gt; &lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="err"&gt;┌─────────────────────────┐&lt;/span&gt;   &lt;span class="err"&gt;┌─────────────────┐&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'active'&lt;/span&gt;  &lt;span class="err"&gt;│──&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;indexed&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'inactive'&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;                 &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;96&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;smaller&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;                 &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;Faster&lt;/span&gt; &lt;span class="n"&gt;scans&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'active'&lt;/span&gt;  &lt;span class="err"&gt;│──&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;indexed&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;Reduced&lt;/span&gt; &lt;span class="n"&gt;maintenance&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'canceled'&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;                 &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'active'&lt;/span&gt;  &lt;span class="err"&gt;│──&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;Row&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;indexed&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;        &lt;span class="err"&gt;│&lt;/span&gt;   &lt;span class="err"&gt;└─────────────────┘&lt;/span&gt;
&lt;span class="err"&gt;└─────────────────────────┘&lt;/span&gt;   &lt;span class="k"&gt;Only&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt; &lt;span class="n"&gt;indexed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following sections demonstrate how to implement this selective indexing approach, starting with the core benefits and progressing through identification strategies, technical requirements, and advanced patterns for complex scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Precision Over Breadth
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp9gpf3lesbxz0yqou27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp9gpf3lesbxz0yqou27.png" alt="Partial indexes" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Traditional indexes include every row in a table, but partial indexes target specific subsets that align with your application's access patterns. This precision yields indexes that are orders of magnitude smaller—and correspondingly faster—while consuming fewer resources during write operations.&lt;/p&gt;

&lt;p&gt;Here's a practical comparison showing the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Problem: Indexing all orders when you primarily query active ones&lt;/span&gt;
&lt;span class="c1"&gt;-- Full index includes millions of completed/cancelled orders&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_customer_full&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Solution: Partial index targets only operationally relevant orders&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_active_customer&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- This query now uses a dramatically smaller, faster index&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'user_123'&lt;/span&gt; 
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Identifying Partial Index Opportunities
&lt;/h3&gt;

&lt;p&gt;Analyze your query patterns for consistent filtering conditions that significantly reduce the result set. Common patterns include status-based filtering (active/inactive), temporal constraints (recent records), and priority-based queries (high-priority items). The following are examples of status-based and temporal filtering patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- SaaS application: Active subscriptions represent a tiny fraction of the total&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_subscriptions_active_user&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- E-commerce: Recent orders for customer service and fulfillment&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_recent_processing&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt; 
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'cancelled'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Query Planner Predicate Matching
&lt;/h3&gt;

&lt;p&gt;For the Postgres query planner to utilize a partial index, it must determine that your query's &lt;code&gt;WHERE&lt;/code&gt; clause is logically implied by the index's predicate. This requires exact matches or mathematically provable implications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Partial index predicate&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_high_value_transactions&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- These queries CAN use the index:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'abc'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- Exact match&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'abc'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- Implies amount &amp;gt; 1000&lt;/span&gt;

&lt;span class="c1"&gt;-- This query CANNOT use the index:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'abc'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;-- Doesn't imply amount &amp;gt; 1000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Advanced Partial Index Patterns
&lt;/h3&gt;

&lt;p&gt;You can also combine partial indexes with expressions for maximum optimization impact, targeting both data subsets and computed values simultaneously. Here are three advanced patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Multi-tenant SaaS: Index active tenant data with normalized identifiers&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_tenant_data_active_normalized&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;tenant_data&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_slug&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Unique constraints on subsets: One active subscription per user&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_unique_active_subscription&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Error tracking: Index only failed events with extracted error codes&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_events_error_codes&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'error_code'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;occurred_at&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'error_code'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Partial indexes transform broad, resource-intensive indexing strategies into focused, high-performance solutions that align database resources with actual application usage patterns, setting the foundation for exploring additional advanced indexing techniques.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Advanced Indexing Techniques in Postgres
&lt;/h2&gt;

&lt;p&gt;Beyond expression and partial indexes, Postgres offers specialized indexing methods that address specific data types and query patterns common in modern Supabase applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  GIN for Composite Data
&lt;/h3&gt;

&lt;p&gt;GIN indexes excel at indexing composite data types where individual items contain multiple searchable elements. Unlike B-tree indexes, which store complete values, GIN employs an inverted index approach that maps content (such as words, elements, or keys) to the locations (row IDs) where that content appears. This makes them essential for &lt;code&gt;jsonb&lt;/code&gt; document queries, array operations, and full-text search scenarios that B-tree indexes cannot handle efficiently.&lt;/p&gt;

&lt;p&gt;Here are two common GIN index patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- JSONB containment queries&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_user_preferences_gin&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;user_profiles&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preferences&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Optimizes: WHERE preferences @&amp;gt; '{"theme": "dark"}'&lt;/span&gt;

&lt;span class="c1"&gt;-- Array overlap operations  &lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_product_tags_gin&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Optimizes: WHERE tags &amp;amp;&amp;amp; ARRAY['electronics', 'mobile']&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GiST for Complex Types
&lt;/h3&gt;

&lt;p&gt;GiST indexes provide a flexible framework for indexing geometric data and range types and enabling nearest-neighbor searches. They're particularly valuable for Supabase applications using &lt;a href="https://postgis.net/" rel="noopener noreferrer"&gt;PostGIS&lt;/a&gt; for geospatial functionality.&lt;/p&gt;

&lt;p&gt;Here are two GiST patterns for geospatial and temporal data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Geospatial queries with PostGIS&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_locations_geom&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;locations&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GiST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Optimizes: WHERE geom &amp;amp;&amp;amp; ST_MakeEnvelope(lng1, lat1, lng2, lat2, 4326)&lt;/span&gt;

&lt;span class="c1"&gt;-- Range type overlaps&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_events_timerange&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GiST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time_period&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Optimizes: WHERE time_period &amp;amp;&amp;amp; '[2024-01-01, 2024-01-31]'::tsrange&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HNSW Indexes for Vector Similarity Search
&lt;/h3&gt;

&lt;p&gt;For AI and machine learning applications storing vector embeddings, Postgres's &lt;code&gt;pgvector&lt;/code&gt; extension provides HNSW indexes optimized for high-dimensional similarity searches.&lt;/p&gt;

&lt;p&gt;HNSW indexes work by creating a multilayered graph structure where each layer contains increasingly fewer nodes, allowing for efficient navigation from coarse to fine-grained similarity matches. This hierarchical approach enables fast approximate nearest-neighbor searches in high-dimensional vector spaces. Here's the basic implementation pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Vector embeddings for semantic search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_documents_embedding&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Optimizes: ORDER BY embedding  &amp;lt;=&amp;gt; query_vector LIMIT 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HNSW indexes excel at k-nearest neighbor queries but &lt;a href="https://opensearch.org/blog/a-practical-guide-to-selecting-hnsw-hyperparameters/" rel="noopener noreferrer"&gt;require careful consideration of key parameters&lt;/a&gt;. The &lt;code&gt;m&lt;/code&gt; parameter controls the number of bidirectional links each node maintains, affecting the recall-performance balance—higher values improve search quality but increase memory usage and build time. The &lt;code&gt;ef_construction&lt;/code&gt; parameter determines the size of the candidate list during index construction, where larger values create higher-quality indexes at the cost of longer build times.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Trade-Offs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/textsearch-indexes.html" rel="noopener noreferrer"&gt;GIN indexes offer faster lookups&lt;/a&gt; but require longer build times and consume more storage. They're optimal for static data with frequent reads. GiST indexes provide faster updates and a smaller storage footprint, making them suitable for dynamic data scenarios.&lt;/p&gt;

&lt;p&gt;HNSW indexes deliver excellent performance for vector similarity searches but involve trade-offs between search accuracy and speed. Higher &lt;code&gt;m&lt;/code&gt; and &lt;code&gt;ef_construction&lt;/code&gt; values improve recall but significantly increase index size and build time, making parameter tuning essential for production deployments.&lt;/p&gt;

&lt;p&gt;These advanced indexing strategies &lt;a href="https://supabase.com/docs/guides/database/debugging-performance" rel="noopener noreferrer"&gt;require manual implementation and validation&lt;/a&gt; using &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; as Supabase's Index Advisor currently focuses only on B-tree recommendations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing Your Indexing Strategy
&lt;/h2&gt;

&lt;p&gt;Having explored the various advanced indexing techniques and their practical applications, you need to understand how to choose the right strategy for your specific use case. Here's a simple decision framework that can help you determine which index is best suited for your needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expression:&lt;/strong&gt; When queries consistently apply functions (&lt;code&gt;LOWER&lt;/code&gt;, &lt;code&gt;EXTRACT&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partial:&lt;/strong&gt; When over 80 percent of queries target specific data subsets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GIN:&lt;/strong&gt; When working with &lt;code&gt;jsonb&lt;/code&gt;, arrays, or full-text search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GiST:&lt;/strong&gt; When dealing with geospatial data or range types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With you having explored the full spectrum of advanced indexing options, the question becomes this: How do you systematically implement and validate these techniques?&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Indexing in Postgres
&lt;/h2&gt;

&lt;p&gt;Effective indexing requires a strategic approach that balances query performance gains against write overhead and maintenance costs. These best practices guide you through systematic index evaluation, performance measurement, and overhead management to ensure your Supabase application scales efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using EXPLAIN ANALYZE to Measure Performance
&lt;/h3&gt;

&lt;p&gt;Before creating any index, capture baseline performance using &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; to document current execution plans, costs, and actual execution times. This baseline enables accurate measurement of indexing impact. Here is an example of capturing baseline performance for a typical query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Capture baseline performance&lt;/span&gt;
&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12345&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you establish this baseline, follow a systematic process to validate the effectiveness of your new index. Below is an example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Create the index&lt;/em&gt; (use &lt;code&gt;CONCURRENTLY&lt;/code&gt; in production):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;idx_orders_customer_status&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Update table statistics&lt;/em&gt; to ensure the planner recognizes the new index:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Rerun &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;&lt;/em&gt; on the same query and compare results:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12345&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you're comparing performance before and after index creation, focus on these critical metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Actual execution time:&lt;/strong&gt; Look for significant reductions in total query time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scan type changes:&lt;/strong&gt; Sequential scans should become index or bitmap heap scans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows examined:&lt;/strong&gt; Verify that the index reduces the number of rows processed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffer activity:&lt;/strong&gt; A lower &lt;code&gt;shared_blks_read&lt;/code&gt; indicates reduced I/O.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Successful indexing typically shows noticeable execution time reductions for well-targeted queries, with scan types changing from &lt;code&gt;Seq Scan&lt;/code&gt; to &lt;code&gt;Index Scan&lt;/code&gt; or &lt;code&gt;Bitmap Heap Scan&lt;/code&gt; using your new index.&lt;/p&gt;

&lt;h3&gt;
  
  
  Criteria for Creating Indexes
&lt;/h3&gt;

&lt;p&gt;Effective indexing requires strategic prioritization and alignment with query patterns and data types. Here are some best practices to guide your indexing decisions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Target high-impact queries first:&lt;/strong&gt; Focus indexing efforts on queries identified through &lt;code&gt;pg_stat_statements&lt;/code&gt; that exhibit high &lt;code&gt;total_exec_time&lt;/code&gt;, frequent &lt;code&gt;calls&lt;/code&gt;, or excessive &lt;code&gt;shared_blks_read&lt;/code&gt; values. Prioritize queries that combine high frequency with slow execution times—a query executed ten thousand times daily with a fifty-millisecond average latency has a greater impact than one executed ten times with a five-hundred-millisecond latency.&lt;/p&gt;

&lt;p&gt;To identify these high-impact queries, you can run the following SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Identify high-impact queries using pg_stat_statements&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean_exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shared_blks_read&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_statements&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data type and query pattern alignment:&lt;/strong&gt; Match index types to data characteristics and query patterns. Use B-tree indexes for scalar equality and range queries, GIN indexes for &lt;code&gt;jsonb&lt;/code&gt; containment and array operations, GiST indexes for geospatial queries and full-text search on dynamic data, and partial indexes when queries consistently target specific data subsets.&lt;/p&gt;

&lt;p&gt;The following is an example of index creation for common query patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- JSONB queries: Use GIN indexes&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_user_preferences_gin&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;user_profiles&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preferences&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Geospatial queries: Use GiST indexes  &lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_locations_geom&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;locations&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GiST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Frequent subset queries: Use partial indexes&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_active_subscriptions&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Selectivity and cardinality considerations:&lt;/strong&gt; Create indexes on columns with high selectivity (many distinct values) for equality queries and moderate selectivity for range queries. Avoid indexing columns with extremely low cardinality (like Boolean flags) unless combined with other columns or used in partial indexes targeting minority cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Write Overhead from Excessive Indexing
&lt;/h3&gt;

&lt;p&gt;Every index introduces write overhead because Postgres must update the index structure for each &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, or &lt;code&gt;DELETE&lt;/code&gt; operation that affects indexed columns. The &lt;code&gt;pganalyze&lt;/code&gt; model estimates this overhead as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;write_overhead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index_entry_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;row_size&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;partial_index_selectivity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This represents additional bytes written to maintain indexes per byte written to the table.&lt;/p&gt;

&lt;p&gt;To understand the practical consequences of excessive indexing, let's take a look at some real-world benchmark data that illustrates why careful index management is essential:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quantifying overindexing impact:&lt;/strong&gt; Real-world benchmarks demonstrate severe performance degradation from excessive indexing. &lt;a href="https://www.percona.com/blog/benchmarking-postgresql-the-hidden-cost-of-over-indexing/" rel="noopener noreferrer"&gt;One study&lt;/a&gt; showed that increasing indexes from seven to thirty-nine across a schema resulted in &lt;em&gt;a 58 percent reduction in transactions per second&lt;/em&gt; (1,400 TPS to 600 TPS) and &lt;em&gt;a transaction-latency increase&lt;/em&gt; from eleven milliseconds to twenty-six milliseconds average.&lt;/p&gt;

&lt;p&gt;This degradation compounds in write-heavy Supabase applications, making selective indexing critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identifying and removing unused indexes:&lt;/strong&gt; Regularly audit for unused indexes that provide no query benefit but continue imposing write overhead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Using Supabase CLI&lt;/span&gt;
&lt;span class="n"&gt;supabase&lt;/span&gt; &lt;span class="n"&gt;inspect&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="n"&gt;unused&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;indexes&lt;/span&gt;

&lt;span class="c1"&gt;-- Or query pg_stat_user_indexes directly&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indexname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx_scan&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_user_indexes&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idx_scan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indexname&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For applications with high write volumes, consider the following strategies to manage indexing effectively and reduce write overhead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prioritize partial indexes&lt;/em&gt; to minimize the subset of writes requiring index updates.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Combine multiple query needs&lt;/em&gt; into a single multicolumn index rather than creating multiple single-column indexes.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Consider deferred indexing&lt;/em&gt; for batch processing scenarios where indexes can be dropped during bulk operations and recreated afterward.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Monitor pg_stat_statements&lt;/em&gt; for queries with high &lt;code&gt;total_plan_time&lt;/code&gt;, which can indicate excessive index evaluation overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to achieve surgical precision by creating indexes that provide substantial query performance improvements while minimizing unnecessary write overhead that could degrade overall application throughput.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation-Priority Framework
&lt;/h3&gt;

&lt;p&gt;Here's a simple framework that can help you to prioritize your optimization efforts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;High impact, low risk:&lt;/strong&gt; Partial indexes on status columns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medium impact, medium risk:&lt;/strong&gt; Expression indexes for case-insensitive searches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High impact, high complexity:&lt;/strong&gt; GIN indexes for &lt;code&gt;jsonb&lt;/code&gt; queries&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Advanced Postgres indexing strategies transform Supabase applications from performance bottlenecks into high-speed, scalable systems. Expression indexes eliminate sequential scan penalties for function-based queries, while partial indexes provide surgical precision while reducing size and write overhead. GIN indexes unlock JSONB and array operations, GiST indexes enable geospatial queries, and HNSW indexes power AI applications with vector similarity search.&lt;/p&gt;

&lt;p&gt;While Supabase's Index Advisor handles basic B-tree optimization, real-world performance demands the manual implementation of these advanced techniques. Strategic indexing decisions—knowing when a partial index on active records outperforms a full table index or when a GIN index eliminates &lt;code&gt;jsonb&lt;/code&gt; query bottlenecks—separate applications that struggle under load from those that scale effortlessly.&lt;/p&gt;

&lt;p&gt;Mastering these techniques delivers compound benefits—faster queries improve user experience, reduced resource consumption controls costs, and scalable architecture prevents technical debt accumulation.&lt;/p&gt;

</description>
      <category>supabase</category>
      <category>postgressql</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Data Integrity First: Mastering Transactions in Supabase SQL for Reliable Applications</title>
      <dc:creator>Damaso Sanoja</dc:creator>
      <pubDate>Tue, 23 Sep 2025 11:42:39 +0000</pubDate>
      <link>https://dev.to/damasosanoja/data-integrity-first-mastering-transactions-in-supabase-sql-for-reliable-applications-2dbb</link>
      <guid>https://dev.to/damasosanoja/data-integrity-first-mastering-transactions-in-supabase-sql-for-reliable-applications-2dbb</guid>
      <description>&lt;p&gt;Transferring $500 between bank accounts, reserving the last seat on a flight, updating inventory after a flash-sale checkout—all of these operations require multiple SQL statements that must execute as a single, indivisible unit, and any glitch can corrupt your data. Database transactions exist to stop that from happening. By wrapping related statements into an all-or-nothing unit, Postgres ensures that balances, orders, and records remain consistent, regardless of the traffic or network conditions.&lt;/p&gt;

&lt;p&gt;But relying on these safeguards isn't as simple as sprinkling &lt;code&gt;BEGIN&lt;/code&gt; and &lt;code&gt;COMMIT&lt;/code&gt; into your code. You still have to address challenges like &lt;a href="https://www.linkedin.com/pulse/race-condition-database-trong-luong-van-9fsuc/" rel="noopener noreferrer"&gt;race conditions&lt;/a&gt;, &lt;a href="https://www.postgresql.org/docs/current/ddl-constraints.html" rel="noopener noreferrer"&gt;constraint violations&lt;/a&gt;, and mid-transaction failures across API layers. &lt;a href="https://supabase.com/" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; helps solve these issues by building on Postgres and handling the transaction logic directly at the database itself. It exposes the logic through streamlined interfaces that preserve data integrity without the usual middleware complexity.&lt;/p&gt;

&lt;p&gt;In this guide, I'll explain how transaction-consistency guarantees in Postgres actually work, show you manual and programmatic transaction patterns I've used, how to handle concurrency with isolation controls and row-level locks, and teach you how to build data integrity into your application using &lt;a href="https://github.com/orgs/supabase/discussions/526#discussioncomment-12190267" rel="noopener noreferrer"&gt;Supabase's database-first approach&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Transactions in Postgres
&lt;/h2&gt;

&lt;p&gt;In Postgres, a &lt;a href="https://www.postgresql.org/docs/current/tutorial-transactions.html" rel="noopener noreferrer"&gt;transaction&lt;/a&gt; is a logical unit of work that groups one or more database operations together to represent a complete business process or workflow. For example, transferring money between bank accounts involves multiple operations (debiting one account, crediting another) that logically belong together as a single business transaction.&lt;/p&gt;

&lt;p&gt;To ensure reliable and consistent data processing, Postgres provides specific guarantees for the execution of transactions through "&lt;a href="https://www.postgresql.org/docs/17/glossary.html#GLOSSARY-ACID" rel="noopener noreferrer"&gt;ACID&lt;/a&gt; compliance." This means that every transaction automatically follows four properties: &lt;em&gt;atomicity&lt;/em&gt;, &lt;em&gt;consistency&lt;/em&gt;, &lt;em&gt;isolation&lt;/em&gt;, and &lt;em&gt;durability&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Atomicity&lt;/em&gt; ensures that all operations within a transaction either complete successfully together or fail together as a single unit. In our bank-transfer example, if a transfer of $500 from one account to another encounters any failure (such as insufficient funds, invalid account numbers, or system errors), the entire transaction rolls back, ensuring that money is never debited from the sender's account without being credited to the receiver's account.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Consistency&lt;/em&gt; ensures data-integrity rules and business constraints are maintained throughout the transaction. In our bank-transfer scenario, consistency ensures that account balances never become negative, account numbers remain valid, and the total money in the system stays the same—if $500 leaves one account, exactly $500 must arrive in another account, preserving the fundamental accounting principle that debits must equal credits.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Isolation&lt;/em&gt; prevents concurrent transactions from interfering with each other during execution. In our bank-transfer example, if multiple transfers involving the same accounts happen simultaneously, isolation ensures that each transaction sees a consistent view of account balances and prevents race conditions where concurrent transfers might result in incorrect final balances or overdrafts.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Durability&lt;/em&gt; guarantees that once a transaction is committed, the changes persist permanently even in the face of system failures. In our bank-transfer scenario, once the transfer completes successfully, the updated account balances are permanently stored and will survive power outages, system crashes, or hardware failures—ensuring that the financial transaction cannot be lost or reversed due to technical issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  ACID Rigor in Practice: When Full Compliance Matters
&lt;/h3&gt;

&lt;p&gt;While Postgres is inherently designed to provide robust ACID compliance for all transactions, the &lt;em&gt;degree&lt;/em&gt; of transactional rigor, particularly concerning &lt;em&gt;isolation&lt;/em&gt;, can be tailored to specific application needs. This flexibility allows developers to balance strong consistency guarantees with performance and concurrency requirements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/transaction-iso.html" rel="noopener noreferrer"&gt;Postgres offers several isolation levels&lt;/a&gt; to achieve this balance, with &lt;code&gt;READ COMMITTED&lt;/code&gt; providing a good default for many applications and &lt;code&gt;SERIALIZABLE&lt;/code&gt; offering the highest level of strictness; we will delve into these specific isolation levels and their implications in detail later in this guide.&lt;/p&gt;

&lt;p&gt;For now, all you need to know is that choosing the appropriate isolation level within Postgres depends on your specific use case and its tolerance for certain types of temporary inconsistencies.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Highest Isolation Required (&lt;em&gt;eg&lt;/em&gt; SERIALIZABLE)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Relaxed Isolation Acceptable (&lt;em&gt;eg&lt;/em&gt; READ COMMITTED)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Financial Systems:&lt;/strong&gt; Money transfers require complete isolation to prevent phenomena like phantom reads (new rows appearing in repeated queries) or nonrepeatable reads (same query returning different results) during complex calculations or audits.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Social Media Feeds:&lt;/strong&gt; Displaying like counts or follower numbers can tolerate slight delays or inconsistencies in real time as long as the data eventually settles.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Healthcare Records:&lt;/strong&gt; Patient charts need absolute isolation to prevent simultaneous updates from overwriting critical medication dosages or treatment notes, ensuring data integrity across a session.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Content Management:&lt;/strong&gt; Blog-post view counts or comment threads can tolerate brief inconsistencies during high traffic periods, where exact real-time accuracy isn't paramount.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Inventory Management:&lt;/strong&gt; Order processing requires the highest consistency and isolation to prevent accepting orders for nonexistent items, avoiding unfulfillable orders in highly concurrent environments.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Analytics Dashboards:&lt;/strong&gt; Metrics aggregation can use data that might be slightly stale or experience minor inconsistencies from concurrent writes, as exact real-time precision isn't critical for trend analysis.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Booking Systems:&lt;/strong&gt; Hotel or flight reservations need strict serializable consistency to prevent overbooking scenarios, ensuring that concurrent booking attempts behave as if they happened one after another.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Recommendation Engines:&lt;/strong&gt; Product suggestions can work with slightly stale user-preference data without significantly degrading user experience, as long as updates eventually propagate.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For applications that fall into the "highest isolation required" category, implementing these strict transactional guarantees becomes paramount to system reliability and data integrity within Postgres.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simplifying ACID: Supabase's Database-First Approach
&lt;/h3&gt;

&lt;p&gt;Effectively using Postgres's native ACID capabilities for complex business logic in modern applications often introduces significant architectural and development challenges. This is because developers typically need to implement extensive &lt;em&gt;middleware solutions&lt;/em&gt;—intricate application-level code to manually orchestrate transaction boundaries, handle errors, and ensure atomicity across multiple database operations or API calls.&lt;/p&gt;

&lt;p&gt;Here, you could use something like Supabase, an open source Firebase alternative, to extend Postgres capabilities with a "database-first architecture."&lt;/p&gt;

&lt;p&gt;Common business logic is encapsulated as &lt;a href="https://www.ibm.com/docs/en/aix/7.3.0?topic=concepts-remote-procedure-call" rel="noopener noreferrer"&gt;remote procedure calls (RPCs)&lt;/a&gt; directly within the database (&lt;em&gt;eg&lt;/em&gt; as Postgres functions). Postgres functions execute atomically by design, while Supabase's role is to provide an RPC mechanism to invoke these functions as single, indivisible transactions. This means developers no longer need to write cumbersome application-level code. Instead, the robust ACID guarantees of Postgres are fully utilized directly at the data layer, significantly simplifying application architecture, reducing potential failure points, and inherently ensuring data integrity, allowing developers to fully rely on the database's native transactional power.&lt;/p&gt;

&lt;p&gt;In the next section, I'll explore how to implement these transaction controls through Supabase and see the database-first approach in action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing and Executing Transactions in Supabase
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://supabase.com/docs/guides/database/overview" rel="noopener noreferrer"&gt;Supabase's Postgres foundation&lt;/a&gt; provides direct access to transaction control through three fundamental commands: &lt;a href="https://www.postgresql.org/docs/current/sql-begin.html" rel="noopener noreferrer"&gt;&lt;code&gt;BEGIN&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://www.postgresql.org/docs/current/sql-commit.html" rel="noopener noreferrer"&gt;&lt;code&gt;COMMIT&lt;/code&gt;&lt;/a&gt;, and &lt;a href="https://www.postgresql.org/docs/current/sql-rollback.html" rel="noopener noreferrer"&gt;&lt;code&gt;ROLLBACK&lt;/code&gt;&lt;/a&gt;. While the examples in this guide demonstrate these concepts using banking scenarios, the patterns apply universally—whether you're managing e-commerce inventory, healthcare records, social media content, or any application requiring data consistency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Transaction Structure
&lt;/h3&gt;

&lt;p&gt;Every manual Postgres transaction follows this pattern in Supabase's SQL editor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Marks the start of a new transaction&lt;/span&gt;
&lt;span class="c1"&gt;-- Your SQL operations here (these changes are temporary until committed)&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Makes all changes permanent and ends the transaction&lt;/span&gt;
&lt;span class="c1"&gt;-- OR&lt;/span&gt;
&lt;span class="c1"&gt;-- ROLLBACK; -- Cancels all changes made since BEGIN and ends the transaction&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure creates a transaction boundary that treats all enclosed operations as a single unit. The &lt;code&gt;BEGIN&lt;/code&gt; statement opens the transaction, operations execute within this protected context, and &lt;code&gt;COMMIT&lt;/code&gt; makes all changes permanent. If any operation fails, &lt;code&gt;ROLLBACK&lt;/code&gt; cancels everything, returning the database to its pretransaction state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simple Transfer Example
&lt;/h3&gt;

&lt;p&gt;Here's a simple money-transfer scenario that demonstrates the core transaction workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Start the transaction&lt;/span&gt;

&lt;span class="c1"&gt;-- Debit the sender's account&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Credit the receiver's account&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Finalize both operations together&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This transaction performs two critical operations: It debits one account and credits another.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Crucially, this explicit transaction wrapper is vital when multiple operations are logically interdependent.&lt;/em&gt; Without grouping these two &lt;a href="https://www.postgresql.org/docs/current/sql-update.html" rel="noopener noreferrer"&gt;&lt;code&gt;UPDATE&lt;/code&gt;&lt;/a&gt; statements into a single transaction, a system failure &lt;em&gt;between&lt;/em&gt; them could lead to data inconsistency—money might disappear from the first account without ever reaching the second, as each &lt;code&gt;UPDATE&lt;/code&gt; would commit independently.&lt;/p&gt;

&lt;p&gt;The same principle applies to any application requiring coordinated updates, such as inventory transfers between warehouses, moving tasks between project phases, or updating user profiles across multiple tables. The transaction ensures either all related changes succeed together or none occur at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  Controlled-Rollback Example
&lt;/h3&gt;

&lt;p&gt;Transactions provide manual control over when to cancel operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Begin a new transaction&lt;/span&gt;

&lt;span class="c1"&gt;-- Attempt to deduct money&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-003'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Check the hypothetical new balance (for illustrative purposes; typically, logic would be in application)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-003'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- If the business logic determines this update is invalid (e.g., overdraft), cancel it&lt;/span&gt;
&lt;span class="k"&gt;ROLLBACK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Explicitly cancels the UPDATE operation and ends the transaction&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern demonstrates conditional transaction control. After performing an operation within the transaction, you can inspect the results and decide whether to &lt;code&gt;COMMIT&lt;/code&gt; or &lt;code&gt;ROLLBACK&lt;/code&gt; based on business logic.&lt;/p&gt;

&lt;p&gt;In e-commerce applications, this might involve checking inventory levels after a reservation; in content management, verifying user permissions after access changes; and in healthcare systems, validating dosage calculations after prescription updates. The ability to cancel transactions based on intermediate results prevents invalid data states from persisting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multitable Transaction Coordination
&lt;/h3&gt;

&lt;p&gt;Complex business operations often require coordinating changes across multiple tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Initiate a transaction for interdependent operations&lt;/span&gt;

&lt;span class="c1"&gt;-- Transfer money between accounts in the 'accounts' table&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-004'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Log the transaction details in a separate 'transactions' audit table&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from_account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transaction_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;-- Get sender's ID&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-004'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;-- Get receiver's ID&lt;/span&gt;
  &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'transfer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'completed'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Commit all three operations as one atomic unit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example coordinates three distinct operations: two balance updates and one audit log insertion.&lt;/p&gt;

&lt;p&gt;The transaction ensures that if the audit logging fails for any reason, the financial transfer also gets cancelled, maintaining perfect synchronization between your primary data and supporting records. This pattern is essential in any application where maintaining data relationships across tables is critical—order-processing systems that update inventory, customer records, and shipping tables simultaneously; user management systems that modify permissions, log changes, and update caches together; or content publishing workflows that update articles, search indexes, and notification queues as atomic units.&lt;/p&gt;

&lt;p&gt;The direct SQL approach shown above works excellently for straightforward scenarios, but what happens when operations fail unexpectedly and you need sophisticated automatic rollback handling?&lt;/p&gt;

&lt;h3&gt;
  
  
  Automatic Rollback on Constraint Violations
&lt;/h3&gt;

&lt;p&gt;When operations violate database constraints, Postgres automatically cancels the entire transaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Start the transaction&lt;/span&gt;

&lt;span class="c1"&gt;-- Attempt to debit an account (this line will likely violate a CHECK constraint like 'positive_balance')&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-004'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- This update will NOT execute if the previous one fails and rolls back the transaction&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- This COMMIT will never be reached if an earlier error occurred&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This transaction attempts to withdraw $1,500 from an account with $0 balance. The first &lt;code&gt;UPDATE&lt;/code&gt; violates our &lt;code&gt;positive_balance&lt;/code&gt; constraint (assuming one exists), triggering an automatic rollback that prevents both updates from executing. Without this protection, the second account would receive money that never left the first account, creating phantom funds in your system.&lt;/p&gt;

&lt;p&gt;The same principle protects any application with data-validation rules—e-commerce systems preventing overselling inventory, healthcare applications blocking invalid dosage combinations, or content management systems enforcing publishing workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Rollback for Business Logic Validation
&lt;/h3&gt;

&lt;p&gt;Sometimes, business rules require custom validation that database constraints cannot enforce:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Start a new transaction&lt;/span&gt;

&lt;span class="c1"&gt;-- Attempt the transfer operations&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-003'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Check a custom business rule (e.g., if this exceeds a daily transfer limit for ACC-002)&lt;/span&gt;
&lt;span class="c1"&gt;-- Note: This SELECT would typically be part of a larger function/application logic.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;daily_total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;from_account_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Assume application logic determines that the daily_total (if retrieved) exceeds $1000.&lt;/span&gt;
&lt;span class="c1"&gt;-- Based on that external check, we manually cancel the transaction.&lt;/span&gt;
&lt;span class="k"&gt;ROLLBACK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Explicitly cancels the two UPDATE operations and ends the transaction&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example performs the financial transfer first and then facilitates validation against business rules. If a custom business rule (like a daily transfer limit) is exceeded, &lt;code&gt;ROLLBACK&lt;/code&gt; cancels both balance updates, preventing the transaction from completing. This pattern is required for complex business logic that requires examining multiple data points—for example, subscription services validating usage limits after resource allocation, project management systems checking capacity constraints after task assignments, or social platforms enforcing interaction limits after engagement tracking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cascading Error Prevention
&lt;/h3&gt;

&lt;p&gt;Transactions prevent cascading failures across related operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Begin the transaction for all interdependent steps&lt;/span&gt;

&lt;span class="c1"&gt;-- Primary financial transfer operations&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;750&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;750&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Secondary operation: Log the transaction details&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from_account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transaction_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="mi"&gt;750&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'transfer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'completed'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Tertiary operation: Update 'updated_at' timestamps on affected accounts&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Commit all three operations together as one atomic unit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any operation in this chain fails—whether the balance updates, transaction logging, or timestamp updates—the entire sequence rolls back. This prevents scenarios where your primary data changes but supporting operations fail, leaving your system in an inconsistent state.&lt;/p&gt;

&lt;p&gt;Applications managing complex workflows depend on this all-or-nothing behavior: order processing systems that must update inventory, payment records, and shipping tables together; user registration flows that create accounts, set permissions, and send notifications atomically; or content-publishing pipelines that update articles, search indexes, and cache layers as coordinated units.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection-Failure Recovery
&lt;/h3&gt;

&lt;p&gt;Network interruptions during transactions automatically trigger rollbacks, protecting against partial updates when client connections drop unexpectedly. This built-in protection ensures that even infrastructure failures cannot corrupt your data through incomplete operations.&lt;/p&gt;

&lt;p&gt;While single-user scenarios benefit significantly from error handling, the real complexity emerges when multiple users access your database simultaneously, creating race conditions that require more sophisticated transaction management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preventing Race Conditions and Concurrency Issues
&lt;/h2&gt;

&lt;p&gt;Race conditions occur when multiple transactions attempt to read and modify the same data simultaneously, creating unpredictable results that corrupt data integrity. These issues manifest most commonly in high-traffic applications where users compete for limited resources—duplicate bookings in event systems, oversold inventory in e-commerce platforms, or conflicting account updates in financial applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Classic Race-Condition Scenario
&lt;/h3&gt;

&lt;p&gt;Consider two users simultaneously transferring money from the same account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- User A's transaction: Wants to withdraw $800&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- User A reads balance: $1000&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- User A calculates new balance: $200&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- User B's transaction (simultaneously): Wants to withdraw $300&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- User B also reads balance: $1000 (before User A's commit)&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- User B calculates new balance: $700&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both transactions read the same initial balance of $1,000, but the final result depends on which transaction commits last.&lt;/p&gt;

&lt;p&gt;If user B commits after user A, user B's update (setting balance to $700) will overwrite user A's change (which would have set it to $200). The account would end up with $700 when it should have $200 ($1000 − $800) minus $300, or −$100.&lt;/p&gt;

&lt;p&gt;This "lost update" causes money to appear or disappear incorrectly. This same pattern destroys data integrity in inventory systems where multiple customers purchase the last item, booking platforms where seats get double-reserved, or content-management systems where collaborative editing overwrites changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Transaction-Isolation Levels
&lt;/h3&gt;

&lt;p&gt;Postgres accepts four isolation-level settings that control how transactions interact with concurrent operations: &lt;code&gt;READ UNCOMMITTED&lt;/code&gt;, &lt;code&gt;READ COMMITTED&lt;/code&gt;, &lt;code&gt;REPEATABLE READ&lt;/code&gt;, and &lt;code&gt;SERIALIZABLE&lt;/code&gt;. However, Postgres doesn't actually implement &lt;code&gt;READ UNCOMMITTED&lt;/code&gt; as a distinct isolation level—it silently upgrades any &lt;code&gt;READ UNCOMMITTED&lt;/code&gt; transaction to &lt;code&gt;READ COMMITTED&lt;/code&gt; for consistency. This means Postgres effectively provides three distinct isolation behaviors, with &lt;code&gt;READ COMMITTED&lt;/code&gt; serving as both the default and the lowest functional isolation level.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;READ COMMITTED&lt;/code&gt; allows transactions to see committed changes from other concurrent transactions. While this prevents "dirty reads" (reading uncommitted data), it can lead to "nonrepeatable reads," where a repeated query within the same transaction returns different results because another transaction committed changes in between:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;TRANSACTION&lt;/span&gt; &lt;span class="k"&gt;ISOLATION&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;READ&lt;/span&gt; &lt;span class="k"&gt;COMMITTED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Postgres's default isolation level&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Start Transaction A&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Transaction A reads balance: $1000&lt;/span&gt;

&lt;span class="c1"&gt;-- At this point, another transaction (Transaction B) might commit a $200 withdrawal from ACC-001.&lt;/span&gt;
&lt;span class="c1"&gt;-- The balance in the database is now $800.&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- Transaction A reads balance again: $800 (a non-repeatable read)&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This behavior suits applications where seeing the most recent data is more important than strict consistency within a single transaction's multiple reads, such as social media feeds displaying ever-updating like counts, news websites where articles are frequently revised, or real-time analytics dashboards where the latest metrics are prioritized over a perfectly frozen historical view within a short session.&lt;/p&gt;

&lt;p&gt;For higher guarantees, &lt;code&gt;REPEATABLE READ&lt;/code&gt; ensures that repeated reads return the same values throughout a transaction, preventing nonrepeatable reads, but it can still allow "phantom reads" (where new rows appear in a result set that was previously empty or smaller).&lt;/p&gt;

&lt;p&gt;Finally, &lt;code&gt;SERIALIZABLE&lt;/code&gt; provides the strongest isolation by preventing all concurrency anomalies, including dirty reads, nonrepeatable reads, and phantom reads. It effectively makes concurrent transactions appear to execute sequentially, guaranteeing that the outcome is the same as if there were no concurrency at all.&lt;/p&gt;

&lt;p&gt;For applications where the highest degree of data integrity and consistency is paramount, such as financial and booking systems, &lt;code&gt;SERIALIZABLE&lt;/code&gt; isolation is often the preferred choice to eliminate complex race conditions and ensure predictable outcomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Row-Level Locking with SELECT FOR UPDATE
&lt;/h3&gt;

&lt;p&gt;You can also prevent race conditions in a read-modify-write scenario by explicitly locking rows during the operation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Select the row and place an exclusive lock on it&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Perform the update; other transactions attempting to FOR UPDATE this row will now wait&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- The lock is released when the transaction commits or rolls back&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;FOR UPDATE&lt;/code&gt; clause creates an exclusive lock on the selected row, forcing other transactions attempting the same operation to wait until the current transaction commits. This eliminates race conditions by serializing access to contested resources.&lt;/p&gt;

&lt;p&gt;Event-booking systems use this technique to prevent double reservations by locking seat records during the booking process. E-commerce platforms lock inventory records during purchase transactions to prevent overselling. Social media applications lock user profiles during complex update operations to prevent conflicting modifications.&lt;/p&gt;

&lt;p&gt;However, while &lt;code&gt;SELECT FOR UPDATE&lt;/code&gt; offers a targeted solution by making conflicting transactions wait, &lt;code&gt;SERIALIZABLE&lt;/code&gt; provides a broader isolation level that ensures complete transactional correctness across all operations by preventing any concurrency anomalies.&lt;/p&gt;

&lt;p&gt;Which to use depends on your specific use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SELECT FOR UPDATE&lt;/code&gt; is ideal for explicit "read-modify-write" patterns on known, frequently contested rows, offering predictable blocking behavior.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SERIALIZABLE&lt;/code&gt; provides the strongest guarantee against all concurrency issues for an entire transaction, but it requires your application to handle transaction retries (re-executing the transaction when conflicts are detected) when Postgres detects a serialization conflict.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Summing up, use &lt;code&gt;SERIALIZABLE&lt;/code&gt; for complex business logic where absolute data integrity across diverse operations is paramount, even at the cost of occasional retries.&lt;/p&gt;

&lt;p&gt;Understanding these concurrency control mechanisms becomes crucial when implementing transactions through Supabase's various interfaces, where different approaches offer distinct advantages for different use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing Transactions in Supabase
&lt;/h2&gt;

&lt;p&gt;Supabase offers multiple approaches for implementing transactions, each suited to different architectural patterns and complexity requirements. Understanding when to use manual SQL transactions versus programmatic approaches ensures you choose the optimal strategy for your application's needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Transactions via SQL Editor
&lt;/h3&gt;

&lt;p&gt;The SQL Editor provides direct access to Postgres's transaction capabilities for administrative tasks, data migrations, or one-off operations:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrx4ctux0sk6qamuuhg9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrx4ctux0sk6qamuuhg9.png" alt="Demo transaction in Supabase SQL editor" width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This direct SQL approach to transactions is ideal for scenarios requiring precise, one-off control over your database, such as administrative tasks like manually correcting a corrupted record after an incident, performing data migrations where a set of changes must be applied atomically, or executing ad hoc operations that need strong transactional guarantees outside of your application's regular workflow.&lt;/p&gt;

&lt;p&gt;For instance, in an e-commerce system, you might use this approach to manually reverse a fraudulent order's inventory update and credit. In healthcare, it could be used for a critical, one-time data cleanup of patient records. However, integrating this level of transactional control into your application's regular, user-facing features typically requires programmatic solutions that integrate seamlessly with your frontend code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database Functions with RPC Calls
&lt;/h3&gt;

&lt;p&gt;Supabase recommends defining business logic directly within Postgres as &lt;a href="https://supabase.com/docs/guides/database/functions" rel="noopener noreferrer"&gt;functions&lt;/a&gt; (also known as stored procedures) and then executing them using RPC.&lt;/p&gt;

&lt;p&gt;This method encapsulates the entire transaction logic within the database itself, ensuring atomicity and data integrity regardless of client-side or network failures. You interact with these powerful server-side functions using Supabase's client libraries, such as &lt;code&gt;supabase-js&lt;/code&gt; for JavaScript, enabling seamless communication from your frontend code.&lt;/p&gt;

&lt;p&gt;Here's a sample JavaScript snippet demonstrating how a client-side application initiates a complex database operation with a single RPC call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example of invoking a pre-defined Postgres function named `transfer_money`&lt;/span&gt;
&lt;span class="c1"&gt;// using Supabase's JavaScript client library (`supabase.rpc`).&lt;/span&gt;
&lt;span class="c1"&gt;// This function on the database server would contain the SQL operations for a money transfer.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;transfer_money&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;sender_account_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ACC-001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;receiver_account_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ACC-002&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;transfer_amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;150.00&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Handle the response from the RPC call&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Transaction failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Log any error returned by the database function&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Transfer successful:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Confirm successful completion&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The advantage of this approach lies in how Postgres handles these function executions: It automatically wraps the entire function's logic in a single, robust transaction. This means if any operation within the &lt;code&gt;transfer_money&lt;/code&gt; function fails due to connection interruptions between individual SQL commands originating from the client, all changes roll back automatically. &lt;/p&gt;

&lt;h3&gt;
  
  
  Edge Functions for Complex Transaction Logic
&lt;/h3&gt;

&lt;p&gt;For sophisticated business logic requiring external API calls, advanced data validation, or complex conditional operations that cannot reside solely within the database, &lt;a href="https://supabase.com/docs/guides/functions" rel="noopener noreferrer"&gt;Supabase Edge Functions&lt;/a&gt; provide the ideal environment. They act as server-side handlers that can connect directly to your database, giving you programmatic control over transaction flow.&lt;/p&gt;

&lt;p&gt;The following TypeScript code demonstrates an Edge Function handling a transfer request. It includes custom validation and orchestrates the core database transaction via an RPC call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[https://esm.sh/@supabase/supabase-js@2](https://esm.sh/@supabase/supabase-js@2)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;Deno&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SUPABASE_URL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Retrieve Supabase URL from environment variables&lt;/span&gt;
  &lt;span class="nx"&gt;Deno&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SUPABASE_SERVICE_ROLE_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt; &lt;span class="c1"&gt;// Use a service role key for elevated privileges&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleComplexTransfer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reason&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Complex validation logic that might go beyond SQL constraints, executed server-side&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;suspicious&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Transfer blocked for suspicious reason&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Execute the core atomic database transaction via an RPC call to a Postgres function&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;transfer_money&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;sender_account_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;receiver_account_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;transfer_amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Return the result of the database operation to the client&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edge Functions excel in scenarios where your transactional logic must extend beyond the database's direct capabilities.&lt;/p&gt;

&lt;p&gt;For example, in a payment processing system, an Edge Function could validate a credit card with an external payment gateway API before committing the transaction to the database. In a user-onboarding workflow, it might create a user record in Postgres and then call a third-party email service to send a welcome email, ensuring both steps are coordinated. For complex real-time bidding platforms, an Edge Function could enforce elaborate pricing logic or integrate with external analytics services before finalizing a bid in the database. They provide the flexibility of server-side code while maintaining core transaction integrity by delegating atomic database operations to Postgres RPC calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing the Right Approach
&lt;/h3&gt;

&lt;p&gt;Database functions via RPC suit most transaction scenarios—financial transfers, inventory updates, and user registration workflows. Edge Functions are needed when business logic extends beyond database operations to include external API interactions, complex validation requiring multiple data sources, or custom authentication flows.&lt;/p&gt;

&lt;p&gt;Crucially, both approaches maintain ACID properties while offering different levels of flexibility for your application architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Transactions
&lt;/h2&gt;

&lt;p&gt;Effective transaction management requires balancing data integrity with performance considerations. Here are some practices to ensure your applications maintain consistency while avoiding common pitfalls that can degrade system performance or create deadlock scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep Transactions Short and Focused
&lt;/h3&gt;

&lt;p&gt;Minimize transaction duration by performing only essential operations within transaction boundaries. Long-running transactions hold locks longer, increasing contention and reducing overall system throughput:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Good: Focused transaction, only includes critical database operations&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from_account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transaction_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Avoid: Including unrelated, non-database operations within the transaction&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-001'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Do NOT include operations like sending emails, uploading files to S3, or making external API calls here.&lt;/span&gt;
&lt;span class="c1"&gt;-- These operations are slow and do not require transactional atomicity with the database.&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACC-002'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Performing business logic, external API calls, or complex calculations outside transaction boundaries prevents unnecessary lock retention. Reserve transactions exclusively for database operations that must execute atomically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Database Functions for Complex Logic
&lt;/h3&gt;

&lt;p&gt;Encapsulate multistep transaction logic within Postgres functions called via RPC. This approach minimizes network round-trip times and ensures atomic execution regardless of client-side failures.&lt;/p&gt;

&lt;p&gt;As explained, database functions also automatically wrap their contents in transactions, eliminating the risk of partial updates due to network interruptions between separate SQL commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implement Robust Error Handling
&lt;/h3&gt;

&lt;p&gt;Always include comprehensive error handling that accounts for both constraint violations and unexpected failures. Use try-catch blocks in Edge Functions and proper error checking with RPC calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Attempt to execute a complex database operation via RPC&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complex_operation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Check for specific database errors returned by the RPC&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Operation failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Based on error type, implement retry logic, roll back other application state, or notify the user&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Handle successful operation and continue application flow&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Operation successful:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Catch and handle unexpected network errors, Deno runtime errors in Edge Functions, etc.&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unexpected error during operation:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Ensure application state is consistent or user is informed&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Choose Appropriate Isolation Levels
&lt;/h3&gt;

&lt;p&gt;As discussed before, carefully select the appropriate transaction isolation level for your operations. While Postgres's default &lt;code&gt;READ COMMITTED&lt;/code&gt; suits many scenarios, consider &lt;code&gt;SERIALIZABLE&lt;/code&gt; for operations requiring stronger consistency guarantees to prevent specific concurrency anomalies. Remember that higher isolation levels may increase transaction retry requirements in high-contention scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Savepoints for Complex Scenarios
&lt;/h3&gt;

&lt;p&gt;For sophisticated business logic requiring partial rollbacks, use &lt;a href="https://www.postgresql.org/docs/current/sql-savepoint.html" rel="noopener noreferrer"&gt;Postgres's savepoint functionality&lt;/a&gt; within database functions. Savepoints allow rolling back to specific points without canceling entire transactions, providing fine-grained control over complex multistep operations.&lt;/p&gt;

&lt;p&gt;These practices ensure your transaction handling remains performant, reliable, and maintainable as your application scales to handle increasing concurrent users and complex business requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, I explored the critical role of database transactions in preserving data integrity, from understanding Postgres's foundational ACID properties to mastering advanced concurrency control with isolation levels and row-level locking. I also explained how to implement these robust transactional patterns effectively, whether through Supabase's SQL editor, powerful database functions (RPCs), or flexible Edge Functions for complex logic.&lt;/p&gt;

&lt;p&gt;If you apply these principles, you can build applications that ensure data remains consistent and reliable, even in the most demanding, high-traffic scenarios.&lt;/p&gt;

</description>
      <category>supabase</category>
      <category>postgressql</category>
      <category>database</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
