<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: FlorianSuchan</title>
    <description>The latest articles on DEV Community by FlorianSuchan (@floriansuchan).</description>
    <link>https://dev.to/floriansuchan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F672225%2F7d8251aa-8ccb-4bd8-8138-7b491d1d9234.jpeg</url>
      <title>DEV Community: FlorianSuchan</title>
      <link>https://dev.to/floriansuchan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/floriansuchan"/>
    <language>en</language>
    <item>
      <title>How We Used DBtune to Cut Our Postgres Query Time by 50% on AWS RDS</title>
      <dc:creator>FlorianSuchan</dc:creator>
      <pubDate>Fri, 05 Dec 2025 07:20:30 +0000</pubDate>
      <link>https://dev.to/floriansuchan/how-we-used-dbtune-to-cut-our-postgres-query-time-by-50-on-aws-rds-2a5e</link>
      <guid>https://dev.to/floriansuchan/how-we-used-dbtune-to-cut-our-postgres-query-time-by-50-on-aws-rds-2a5e</guid>
      <description>&lt;p&gt;&lt;em&gt;or: Why tuning Postgres still feels like rocket science - and why we're happy to offload it to a robot.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;At Papershift, we run our core product on &lt;strong&gt;Postgres 17.6 on AWS RDS&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;After a major version upgrade, our database was back on the &lt;strong&gt;AWS default parameter group&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Instead of hand-tuning Postgres again, we tried &lt;strong&gt;DBtune&lt;/strong&gt;, an automated tuning service that uses &lt;code&gt;pg_stat_statements&lt;/code&gt; plus ML-based algorithms.&lt;/li&gt;
&lt;li&gt;We ran a full-day tuning session with:

&lt;ul&gt;
&lt;li&gt;31 iterations × 15 minutes&lt;/li&gt;
&lt;li&gt;Guard rails to prevent performance regressions&lt;/li&gt;
&lt;li&gt;Optimization target: &lt;strong&gt;average query runtime&lt;/strong&gt; (no specific “Workload Fingerprint”)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Result:

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;~50% reduction in average query runtime&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;CPU load dropped from peaks around &lt;strong&gt;80% down to ~40–50%&lt;/strong&gt;, consistently across comparable days.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;A second tuning session later didn’t improve further.
&lt;/li&gt;

&lt;li&gt;For RDS users, especially if you’re still on defaults, tools like DBtune are, in my view, close to a &lt;strong&gt;no-brainer&lt;/strong&gt;.&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Who we are: Papershift
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.papershift.com" rel="noopener noreferrer"&gt;Papershift&lt;/a&gt; is a cloud-based HR and workforce management platform from Karlsruhe, Germany. We help businesses plan shifts, track working hours, manage absences and run payroll - all in one place. Instead of spreadsheets and manual processes, our customers manage their teams digitally, across rota planning, time tracking, leave management and payroll preparation.&lt;/p&gt;

&lt;p&gt;Under the hood, that means: lots of schedules, lots of time-tracking events, lots of HR data-and a Postgres-backed SaaS product that runs 24/7 with very spiky traffic between weekdays and weekends.&lt;/p&gt;




&lt;h2&gt;
  
  
  The starting point: fresh Postgres, generic tuning
&lt;/h2&gt;

&lt;p&gt;A couple of weeks ago we upgraded the database for our core product to &lt;strong&gt;Postgres 17.6 on AWS RDS&lt;/strong&gt;. We like to stay current with runtime versions - not just for security reasons, but also for incremental improvements in performance and observability that accumulate over time.&lt;/p&gt;

&lt;p&gt;We didn’t have a burning performance incident. This was more of an “academic” motivation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If we just upgraded Postgres anyway… are there low-hanging fruits we can grab by tuning it properly for our workload?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you’ve ever scrolled through the full Postgres parameter list, you know it’s massive. The RDS defaults are designed to be safe and generic for a broad range of customers. That’s reasonable - but it also means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They’re &lt;strong&gt;not tailored to your workload&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;They rarely match your traffic pattern, or query mix.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, the reality is that Postgres tuning is complex enough that, unless you invest serious time and expertise, you’re unlikely to beat a good automated system by hand.&lt;/p&gt;

&lt;p&gt;So instead of going down the rabbit hole of manuals and blog posts again, we went looking for an automated approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Discovering DBtune
&lt;/h2&gt;

&lt;p&gt;A few years ago, we had used an online tool to help us tune Postgres manually. When we tried to find it again, we ended up discovering something else: &lt;strong&gt;&lt;a href="https://dbtune.com" rel="noopener noreferrer"&gt;DBtune&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;DBtune’s core idea is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Collect &lt;strong&gt;&lt;code&gt;pg_stat_statements&lt;/code&gt;&lt;/strong&gt; from your database.
&lt;/li&gt;
&lt;li&gt;Feed that into their &lt;strong&gt;ML-based tuning engine&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Let it iteratively propose and apply &lt;strong&gt;Postgres parameter changes&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Measure impact for each iteration and keep improving.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In other words: automated Postgres tuning, based on your real workload, not synthetic benchmarks.&lt;/p&gt;

&lt;p&gt;For each tuning session, DBtune:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applies a new configuration, reload only, no restarts (although this can be switched on if you like),&lt;/li&gt;
&lt;li&gt;Observes performance for a configurable time window (e.g. 15 minutes),&lt;/li&gt;
&lt;li&gt;Repeats this for up to &lt;strong&gt;31 iterations&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;Enforces &lt;strong&gt;performance guard rails&lt;/strong&gt; so things can’t silently get worse.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the &lt;a href="https://docs.dbtune.com/ensuring-production-safe-dbtune/" rel="noopener noreferrer"&gt;guard rail&lt;/a&gt; is violated, the current iteration is aborted and it moves on. At the end, you get a final optimized parameter set.&lt;/p&gt;




&lt;h2&gt;
  
  
  Our infrastructure context
&lt;/h2&gt;

&lt;p&gt;A bit of context about how we run this in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our application stack runs on &lt;strong&gt;Kubernetes&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;The cluster itself is managed via &lt;a href="https://www.qovery.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Qovery&lt;/strong&gt;&lt;/a&gt;, which we’re very happy with,
&lt;/li&gt;
&lt;li&gt;Our primary database is &lt;strong&gt;AWS RDS for Postgres 17.6&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup made it easy to integrate DBtune:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We could run their &lt;strong&gt;agent container inside our cluster&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;That container lives “close” to the database,
&lt;/li&gt;
&lt;li&gt;We didn’t need to expose anything new to the public internet or change security groups for the database.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Setting up DBtune for RDS Postgres
&lt;/h2&gt;

&lt;p&gt;DBtune needs two things to work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;pg_stat_statements&lt;/code&gt; data&lt;/strong&gt; (for query-level performance insight)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access to AWS&lt;/strong&gt; to:

&lt;ul&gt;
&lt;li&gt;Read and modify the &lt;strong&gt;parameter group&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Read &lt;strong&gt;RDS performance metrics&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Making sure &lt;code&gt;pg_stat_statements&lt;/code&gt; is ready
&lt;/h3&gt;

&lt;p&gt;We were already using &lt;code&gt;pg_stat_statements&lt;/code&gt;, so there was nothing special to change:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The extension was installed and active,
&lt;/li&gt;
&lt;li&gt;The tracking settings were good enough to capture a rich history.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only thing we did was: &lt;strong&gt;wait about a week&lt;/strong&gt;, so that &lt;code&gt;pg_stat_statements&lt;/code&gt; reflected a representative workload before we started tuning.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Giving DBtune controlled access to RDS
&lt;/h3&gt;

&lt;p&gt;For AWS integration, DBtune doesn’t need full admin access. It needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read and update &lt;strong&gt;parameter groups&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;read &lt;strong&gt;CloudWatch / RDS metrics&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We created an &lt;strong&gt;IAM user + policy&lt;/strong&gt; that was scoped down to exactly that. DBtune uses these credentials to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understand the current configuration,
&lt;/li&gt;
&lt;li&gt;apply new parameter sets during tuning,
&lt;/li&gt;
&lt;li&gt;and track how the database is behaving.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Running the DBtune agent in Kubernetes via Qovery
&lt;/h3&gt;

&lt;p&gt;DBtune provides a &lt;strong&gt;Docker image&lt;/strong&gt; for their agent. In our case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We deployed the agent as a &lt;strong&gt;pod inside our Kubernetes cluster&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;Managed via &lt;strong&gt;Qovery&lt;/strong&gt;, just like our other services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent’s configuration is entirely done via &lt;strong&gt;environment variables&lt;/strong&gt;, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS region
&lt;/li&gt;
&lt;li&gt;Target database / RDS instance identifier
&lt;/li&gt;
&lt;li&gt;Name of the &lt;strong&gt;parameter group&lt;/strong&gt; that DBtune is allowed to change
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because the agent runs inside the same network boundary as our workloads, we didn’t have to touch security groups at all. Network-wise it was just “another internal service”.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cloning the parameter group (and why this matters)
&lt;/h2&gt;

&lt;p&gt;One important step before you let any tool change your parameters:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Never tune the shared AWS default parameter group.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The default group can be used by multiple instances. You don’t want a tuning session for one database to unexpectedly affect others.&lt;/p&gt;

&lt;p&gt;So we did:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clone the RDS default parameter group&lt;/strong&gt; to a &lt;a href="https://docs.dbtune.com/aws-rds/" rel="noopener noreferrer"&gt;new custom one&lt;/a&gt;,
&lt;/li&gt;
&lt;li&gt;Attach that custom parameter group only to our &lt;strong&gt;production database&lt;/strong&gt; that we wanted to tune.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This way, DBtune has a dedicated playground - and we know exactly which database is being changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Too many statements for a “Workload Fingerprint”
&lt;/h2&gt;

&lt;p&gt;DBtune supports 2 different concepts for tuning a database, so called &lt;a href="https://docs.dbtune.com/fingerprint-workload/" rel="noopener noreferrer"&gt;"Workload Fingerprints"&lt;/a&gt; and global tuning. A &lt;strong&gt;“Workload Fingerprint”&lt;/strong&gt; is a curated subset of queries or query patterns that generate the majority of the load. You can tell DBtune: “focus on these patterns when evaluating improvements.”&lt;/p&gt;

&lt;p&gt;In our case, that didn’t work.&lt;/p&gt;

&lt;p&gt;Our &lt;code&gt;pg_stat_statements&lt;/code&gt; output is huge—&lt;strong&gt;hundreds of thousands of entries&lt;/strong&gt;. The attempt to derive a meaningful Workload Fingerprint from this didn’t produce useful recommendations.&lt;/p&gt;

&lt;p&gt;Rather than forcing it, we chose the other option they offer, a more global metric:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Optimize for average query runtime across the workload.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a good fit for us because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our workload is diverse,
&lt;/li&gt;
&lt;li&gt;We care about overall latency and database health,
&lt;/li&gt;
&lt;li&gt;We didn’t want to artificially bias the tuning to a couple of hand-picked queries.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Guard rails: don’t break production
&lt;/h2&gt;

&lt;p&gt;One thing we liked immediately: DBtune has &lt;strong&gt;performance guard rails&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Iteration duration:&lt;/strong&gt; 15 minutes
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum allowed regression:&lt;/strong&gt; 1.5× of the baseline average query time
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practical terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each iteration runs for 15 minutes with a new configuration.
&lt;/li&gt;
&lt;li&gt;If the average query runtime during that iteration exceeds &lt;strong&gt;150%&lt;/strong&gt; of the baseline,

&lt;ul&gt;
&lt;li&gt;DBtune &lt;strong&gt;immediately aborts&lt;/strong&gt; that iteration,
&lt;/li&gt;
&lt;li&gt;and proceeds to the next.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This gives you good confidence to run this &lt;strong&gt;against production&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing the right day: Saturdays, not Mondays
&lt;/h2&gt;

&lt;p&gt;Our traffic is very uneven:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weekdays vs weekends,
&lt;/li&gt;
&lt;li&gt;Day vs night.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the &lt;strong&gt;first tuning session&lt;/strong&gt;, we wanted to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;See &lt;strong&gt;real traffic&lt;/strong&gt;, not synthetic load,
&lt;/li&gt;
&lt;li&gt;But also be able to absorb temporary regressions without harming too many customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we scheduled the session for a &lt;strong&gt;Saturday during daytime&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enough real activity to be representative,
&lt;/li&gt;
&lt;li&gt;Lower overall load than a typical weekday.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We configured DBtune for the full session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;31 iterations × 15 minutes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;That nicely covers a full working day of tuning.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The results: 50% faster queries, much lower CPU
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbdxjibas4uw4gwahq7d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbdxjibas4uw4gwahq7d.png" alt="Summary of first tuning session" width="800" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The outcome of the first tuning session was surprisingly strong. While the yellow bars indicate warnings because the 1.5× guardrails was hit, those are actually still better than the default configuration. The reason: Queries that ran in this window were "by accident" faster than our average - again: Our workload is very uneven.&lt;/p&gt;

&lt;p&gt;Looking at our metrics across comparable days (e.g. Weekday before vs weekday after):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fouhat84jqerdhox5trxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fouhat84jqerdhox5trxd.png" alt="RDS performance insights" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Average query runtime dropped by about 50%.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;RDS &lt;strong&gt;CPU utilization&lt;/strong&gt; went down significantly:

&lt;ul&gt;
&lt;li&gt;before: peaks around &lt;strong&gt;80%&lt;/strong&gt; on busy days,
&lt;/li&gt;
&lt;li&gt;after: more typically around &lt;strong&gt;30–50%&lt;/strong&gt;, even under comparable traffic.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We didn’t stop at a single day of data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88tn6q06v1dqgsbtha6t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88tn6q06v1dqgsbtha6t.png" alt="CPU Utilization" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We monitored the new configuration over &lt;strong&gt;several days and weeks&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;The improvements stayed stable and consistent,
&lt;/li&gt;
&lt;li&gt;We didn’t see any weird regressions elsewhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From an application perspective, this translated into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More headroom under load,
&lt;/li&gt;
&lt;li&gt;Smoother performance during peak usage,
&lt;/li&gt;
&lt;li&gt;And frankly: a healthier-looking database across the board.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What actually changed? (Without drowning in details)
&lt;/h2&gt;

&lt;p&gt;DBtune touched a range of Postgres parameters—covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory settings&lt;/strong&gt; (e.g. how much memory is available for caching and query execution),
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner-related parameters&lt;/strong&gt; (how aggressively the planner explores plan space),
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAL and checkpoint configuration&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;Various &lt;strong&gt;runtime knobs&lt;/strong&gt; that affect how Postgres balances throughput vs latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmsfvq6ugb99jiiabjs8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmsfvq6ugb99jiiabjs8j.png" alt="Postgres parameters modified" width="672" height="1240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The exact values and combinations are less interesting than the meta-point:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is the kind of configuration space that is tedious and risky to explore manually, but feels perfectly suited for an automated, iterative, data-driven approach.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Second tuning session: diminishing returns
&lt;/h2&gt;

&lt;p&gt;After the success of the first run, we scheduled a &lt;strong&gt;second tuning session&lt;/strong&gt;, this time during the week.&lt;/p&gt;

&lt;p&gt;We kept a similar setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;31 iterations,
&lt;/li&gt;
&lt;li&gt;15-minute windows,
&lt;/li&gt;
&lt;li&gt;Same guard rail configuration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This time, however, the outcome was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No meaningful additional improvement,
&lt;/li&gt;
&lt;li&gt;Essentially a “plateau”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why? Two plausible explanations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traffic during the tuning window wasn’t representative.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A 15-minute window is small; if the pattern during that time doesn’t align with your typical load mix, improvements may not generalize.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DBtune already found a near-optimal configuration&lt;/strong&gt; in the first run.&lt;br&gt;&lt;br&gt;
At some point, changing parameters further will mostly shuffle performance between specific edge cases without improving the global picture.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice, we were happy with “no change.” It confirmed that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first tuning session took us to a good place,&lt;/li&gt;
&lt;li&gt;We weren’t leaving obvious gains on the table.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why we like automated Postgres tuning
&lt;/h2&gt;

&lt;p&gt;The experience with DBtune highlighted a few things for us:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Postgres tuning is still rocket science
&lt;/h3&gt;

&lt;p&gt;There are &lt;strong&gt;a lot&lt;/strong&gt; of Postgres parameters. Many of them interact in non-obvious ways. On RDS, some are also managed or constrained by AWS.&lt;/p&gt;

&lt;p&gt;If you’re not spending serious time deep-diving into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;shared_buffers&lt;/code&gt;, &lt;code&gt;work_mem&lt;/code&gt;, &lt;code&gt;effective_cache_size&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;parallelism and planner cost settings,&lt;/li&gt;
&lt;li&gt;WAL and checkpoint tuning,&lt;/li&gt;
&lt;li&gt;autovacuum and maintenance knobs,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;you’re unlikely to systematically explore all useful combinations for your specific workload.&lt;/p&gt;

&lt;p&gt;Letting a machine do this iteratively, with real workload data, just makes sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Production-safe experimentation
&lt;/h3&gt;

&lt;p&gt;The combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short iteration windows,&lt;/li&gt;
&lt;li&gt;Measured metrics per iteration,&lt;/li&gt;
&lt;li&gt;Hard guard rails on regression,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;made it feel safe to run this on our &lt;strong&gt;production database&lt;/strong&gt; without babysitting every minute.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Time and focus
&lt;/h3&gt;

&lt;p&gt;The actual human time we invested was modest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Registering and onboarding,&lt;/li&gt;
&lt;li&gt;Deploying the agent in our Kubernetes cluster via Qovery,&lt;/li&gt;
&lt;li&gt;Setting up IAM permissions and the cloned parameter group,&lt;/li&gt;
&lt;li&gt;Testing setup on staging environment&lt;/li&gt;
&lt;li&gt;Running tuning on production&lt;/li&gt;
&lt;li&gt;Monitoring and sanity-checking results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to a manual tuning project, that’s very little.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Free tier fits experimentation
&lt;/h3&gt;

&lt;p&gt;DBtune’s free tier currently lets you tune up to &lt;strong&gt;three databases&lt;/strong&gt; before you need a paid plan. That’s enough to try it on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A staging environment,
&lt;/li&gt;
&lt;li&gt;One or two production instances,
&lt;/li&gt;
&lt;li&gt;Or multiple services if you run several Postgres clusters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given the potential upside in performance and cost, the cost–benefit ratio is very attractive.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons learned and recommendations
&lt;/h2&gt;

&lt;p&gt;If you’re running Postgres on AWS RDS, here are some takeaways from our journey:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clone defaults into a dedicated parameter group.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Never tune the shared default; always work on a per-instance group.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Invest in observability first.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Make sure &lt;code&gt;pg_stat_statements&lt;/code&gt; is installed and working well before tuning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use guard rails when tuning in production.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Whether via DBtune or another tool, enforce clear limits on acceptable regressions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pick realistic windows and times.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
We had great success running our first session on a &lt;strong&gt;low-to-medium traffic Saturday&lt;/strong&gt; rather than at absolute peak load.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Validate over weeks, not just hours.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Don’t declare victory based on a single day. Track after-effects and stability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don’t feel bad about not being a Postgres tuning expert.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Even if you’re comfortable with Postgres, there’s a lot of subtlety. Offloading the search to an automated system is not a cop out - it’s efficient.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What’s next for us
&lt;/h2&gt;

&lt;p&gt;We’re in touch with the DBtune team and had a great experience with their support and responsiveness. They’re continuing to improve how they handle &lt;strong&gt;very large &lt;code&gt;pg_stat_statements&lt;/code&gt; datasets&lt;/strong&gt;, and we’re hoping that in the future we’ll be able to use more advanced “Workload Fingerprint”-based tuning on top of our global optimization.&lt;/p&gt;

&lt;p&gt;For now, the outcome is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our database is &lt;strong&gt;faster&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Our CPU headroom is &lt;strong&gt;healthier&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Our customers benefit from a more responsive system.
&lt;/li&gt;
&lt;li&gt;And we didn’t have to turn ourselves into full-time Postgres tuning experts to get there.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re on RDS, sitting on the default parameter group, and you’ve been putting off tuning because it feels like rocket science... We’d say: &lt;strong&gt;let a robot take the first shot.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>aws</category>
      <category>performance</category>
      <category>database</category>
    </item>
  </channel>
</rss>
