DEV Community

Cover image for How We Used DBtune to Cut Our Postgres Query Time by 50% on AWS RDS
FlorianSuchan
FlorianSuchan

Posted on

How We Used DBtune to Cut Our Postgres Query Time by 50% on AWS RDS

or: Why tuning Postgres still feels like rocket science - and why we're happy to offload it to a robot.


TL;DR

  • At Papershift, we run our core product on Postgres 17.6 on AWS RDS.
  • After a major version upgrade, our database was back on the AWS default parameter group.
  • Instead of hand-tuning Postgres again, we tried DBtune, an automated tuning service that uses pg_stat_statements plus ML-based algorithms.
  • We ran a full-day tuning session with:
    • 31 iterations × 15 minutes
    • Guard rails to prevent performance regressions
    • Optimization target: average query runtime (no specific “Workload Fingerprint”)
  • Result:
    • ~50% reduction in average query runtime
    • CPU load dropped from peaks around 80% down to ~40–50%, consistently across comparable days.
  • A second tuning session later didn’t improve further.
  • For RDS users, especially if you’re still on defaults, tools like DBtune are, in my view, close to a no-brainer.

Who we are: Papershift

Papershift is a cloud-based HR and workforce management platform from Karlsruhe, Germany. We help businesses plan shifts, track working hours, manage absences and run payroll - all in one place. Instead of spreadsheets and manual processes, our customers manage their teams digitally, across rota planning, time tracking, leave management and payroll preparation.

Under the hood, that means: lots of schedules, lots of time-tracking events, lots of HR data-and a Postgres-backed SaaS product that runs 24/7 with very spiky traffic between weekdays and weekends.


The starting point: fresh Postgres, generic tuning

A couple of weeks ago we upgraded the database for our core product to Postgres 17.6 on AWS RDS. We like to stay current with runtime versions - not just for security reasons, but also for incremental improvements in performance and observability that accumulate over time.

We didn’t have a burning performance incident. This was more of an “academic” motivation:

If we just upgraded Postgres anyway… are there low-hanging fruits we can grab by tuning it properly for our workload?

If you’ve ever scrolled through the full Postgres parameter list, you know it’s massive. The RDS defaults are designed to be safe and generic for a broad range of customers. That’s reasonable - but it also means:

  • They’re not tailored to your workload,
  • They rarely match your traffic pattern, or query mix.

However, the reality is that Postgres tuning is complex enough that, unless you invest serious time and expertise, you’re unlikely to beat a good automated system by hand.

So instead of going down the rabbit hole of manuals and blog posts again, we went looking for an automated approach.


Discovering DBtune

A few years ago, we had used an online tool to help us tune Postgres manually. When we tried to find it again, we ended up discovering something else: DBtune.

DBtune’s core idea is simple:

  1. Collect pg_stat_statements from your database.
  2. Feed that into their ML-based tuning engine.
  3. Let it iteratively propose and apply Postgres parameter changes.
  4. Measure impact for each iteration and keep improving.

In other words: automated Postgres tuning, based on your real workload, not synthetic benchmarks.

For each tuning session, DBtune:

  • Applies a new configuration, reload only, no restarts (although this can be switched on if you like),
  • Observes performance for a configurable time window (e.g. 15 minutes),
  • Repeats this for up to 31 iterations,
  • Enforces performance guard rails so things can’t silently get worse.

If the guard rail is violated, the current iteration is aborted and it moves on. At the end, you get a final optimized parameter set.


Our infrastructure context

A bit of context about how we run this in production:

  • Our application stack runs on Kubernetes,
  • The cluster itself is managed via Qovery, which we’re very happy with,
  • Our primary database is AWS RDS for Postgres 17.6.

This setup made it easy to integrate DBtune:

  • We could run their agent container inside our cluster,
  • That container lives “close” to the database,
  • We didn’t need to expose anything new to the public internet or change security groups for the database.

Setting up DBtune for RDS Postgres

DBtune needs two things to work:

  1. pg_stat_statements data (for query-level performance insight)
  2. Access to AWS to:
    • Read and modify the parameter group
    • Read RDS performance metrics

1. Making sure pg_stat_statements is ready

We were already using pg_stat_statements, so there was nothing special to change:

  • The extension was installed and active,
  • The tracking settings were good enough to capture a rich history.

The only thing we did was: wait about a week, so that pg_stat_statements reflected a representative workload before we started tuning.

2. Giving DBtune controlled access to RDS

For AWS integration, DBtune doesn’t need full admin access. It needs to:

  • read and update parameter groups,
  • read CloudWatch / RDS metrics.

We created an IAM user + policy that was scoped down to exactly that. DBtune uses these credentials to:

  • understand the current configuration,
  • apply new parameter sets during tuning,
  • and track how the database is behaving.

3. Running the DBtune agent in Kubernetes via Qovery

DBtune provides a Docker image for their agent. In our case:

  • We deployed the agent as a pod inside our Kubernetes cluster,
  • Managed via Qovery, just like our other services.

The agent’s configuration is entirely done via environment variables, including:

  • AWS region
  • Target database / RDS instance identifier
  • Name of the parameter group that DBtune is allowed to change

Because the agent runs inside the same network boundary as our workloads, we didn’t have to touch security groups at all. Network-wise it was just “another internal service”.


Cloning the parameter group (and why this matters)

One important step before you let any tool change your parameters:

Never tune the shared AWS default parameter group.

The default group can be used by multiple instances. You don’t want a tuning session for one database to unexpectedly affect others.

So we did:

  1. Clone the RDS default parameter group to a new custom one,
  2. Attach that custom parameter group only to our production database that we wanted to tune.

This way, DBtune has a dedicated playground - and we know exactly which database is being changed.


Too many statements for a “Workload Fingerprint”

DBtune supports 2 different concepts for tuning a database, so called "Workload Fingerprints" and global tuning. A “Workload Fingerprint” is a curated subset of queries or query patterns that generate the majority of the load. You can tell DBtune: “focus on these patterns when evaluating improvements.”

In our case, that didn’t work.

Our pg_stat_statements output is huge—hundreds of thousands of entries. The attempt to derive a meaningful Workload Fingerprint from this didn’t produce useful recommendations.

Rather than forcing it, we chose the other option they offer, a more global metric:

Optimize for average query runtime across the workload.

That’s a good fit for us because:

  • Our workload is diverse,
  • We care about overall latency and database health,
  • We didn’t want to artificially bias the tuning to a couple of hand-picked queries.

Guard rails: don’t break production

One thing we liked immediately: DBtune has performance guard rails.

We configured:

  • Iteration duration: 15 minutes
  • Maximum allowed regression: 1.5× of the baseline average query time

In practical terms:

  • Each iteration runs for 15 minutes with a new configuration.
  • If the average query runtime during that iteration exceeds 150% of the baseline,
    • DBtune immediately aborts that iteration,
    • and proceeds to the next.

This gives you good confidence to run this against production.


Choosing the right day: Saturdays, not Mondays

Our traffic is very uneven:

  • Weekdays vs weekends,
  • Day vs night.

For the first tuning session, we wanted to:

  • See real traffic, not synthetic load,
  • But also be able to absorb temporary regressions without harming too many customers.

So we scheduled the session for a Saturday during daytime:

  • Enough real activity to be representative,
  • Lower overall load than a typical weekday.

We configured DBtune for the full session:

  • 31 iterations × 15 minutes
  • That nicely covers a full working day of tuning.

The results: 50% faster queries, much lower CPU

Summary of first tuning session

The outcome of the first tuning session was surprisingly strong. While the yellow bars indicate warnings because the 1.5× guardrails was hit, those are actually still better than the default configuration. The reason: Queries that ran in this window were "by accident" faster than our average - again: Our workload is very uneven.

Looking at our metrics across comparable days (e.g. Weekday before vs weekday after):

RDS performance insights

  • Average query runtime dropped by about 50%.
  • RDS CPU utilization went down significantly:
    • before: peaks around 80% on busy days,
    • after: more typically around 30–50%, even under comparable traffic.

We didn’t stop at a single day of data:

CPU Utilization

  • We monitored the new configuration over several days and weeks,
  • The improvements stayed stable and consistent,
  • We didn’t see any weird regressions elsewhere.

From an application perspective, this translated into:

  • More headroom under load,
  • Smoother performance during peak usage,
  • And frankly: a healthier-looking database across the board.

What actually changed? (Without drowning in details)

DBtune touched a range of Postgres parameters—covering:

  • Memory settings (e.g. how much memory is available for caching and query execution),
  • Planner-related parameters (how aggressively the planner explores plan space),
  • WAL and checkpoint configuration,
  • Various runtime knobs that affect how Postgres balances throughput vs latency.

Postgres parameters modified

The exact values and combinations are less interesting than the meta-point:

This is the kind of configuration space that is tedious and risky to explore manually, but feels perfectly suited for an automated, iterative, data-driven approach.


Second tuning session: diminishing returns

After the success of the first run, we scheduled a second tuning session, this time during the week.

We kept a similar setup:

  • 31 iterations,
  • 15-minute windows,
  • Same guard rail configuration.

This time, however, the outcome was:

  • No meaningful additional improvement,
  • Essentially a “plateau”.

Why? Two plausible explanations:

  1. Traffic during the tuning window wasn’t representative.

    A 15-minute window is small; if the pattern during that time doesn’t align with your typical load mix, improvements may not generalize.

  2. DBtune already found a near-optimal configuration in the first run.

    At some point, changing parameters further will mostly shuffle performance between specific edge cases without improving the global picture.

In practice, we were happy with “no change.” It confirmed that:

  • The first tuning session took us to a good place,
  • We weren’t leaving obvious gains on the table.

Why we like automated Postgres tuning

The experience with DBtune highlighted a few things for us:

1. Postgres tuning is still rocket science

There are a lot of Postgres parameters. Many of them interact in non-obvious ways. On RDS, some are also managed or constrained by AWS.

If you’re not spending serious time deep-diving into:

  • shared_buffers, work_mem, effective_cache_size,
  • parallelism and planner cost settings,
  • WAL and checkpoint tuning,
  • autovacuum and maintenance knobs,

you’re unlikely to systematically explore all useful combinations for your specific workload.

Letting a machine do this iteratively, with real workload data, just makes sense.

2. Production-safe experimentation

The combination of:

  • Short iteration windows,
  • Measured metrics per iteration,
  • Hard guard rails on regression,

made it feel safe to run this on our production database without babysitting every minute.

3. Time and focus

The actual human time we invested was modest:

  • Registering and onboarding,
  • Deploying the agent in our Kubernetes cluster via Qovery,
  • Setting up IAM permissions and the cloned parameter group,
  • Testing setup on staging environment
  • Running tuning on production
  • Monitoring and sanity-checking results.

Compared to a manual tuning project, that’s very little.

4. Free tier fits experimentation

DBtune’s free tier currently lets you tune up to three databases before you need a paid plan. That’s enough to try it on:

  • A staging environment,
  • One or two production instances,
  • Or multiple services if you run several Postgres clusters.

Given the potential upside in performance and cost, the cost–benefit ratio is very attractive.


Lessons learned and recommendations

If you’re running Postgres on AWS RDS, here are some takeaways from our journey:

  1. Clone defaults into a dedicated parameter group.

    Never tune the shared default; always work on a per-instance group.

  2. Invest in observability first.

    Make sure pg_stat_statements is installed and working well before tuning.

  3. Use guard rails when tuning in production.

    Whether via DBtune or another tool, enforce clear limits on acceptable regressions.

  4. Pick realistic windows and times.

    We had great success running our first session on a low-to-medium traffic Saturday rather than at absolute peak load.

  5. Validate over weeks, not just hours.

    Don’t declare victory based on a single day. Track after-effects and stability.

  6. Don’t feel bad about not being a Postgres tuning expert.

    Even if you’re comfortable with Postgres, there’s a lot of subtlety. Offloading the search to an automated system is not a cop out - it’s efficient.


What’s next for us

We’re in touch with the DBtune team and had a great experience with their support and responsiveness. They’re continuing to improve how they handle very large pg_stat_statements datasets, and we’re hoping that in the future we’ll be able to use more advanced “Workload Fingerprint”-based tuning on top of our global optimization.

For now, the outcome is simple:

  • Our database is faster.
  • Our CPU headroom is healthier.
  • Our customers benefit from a more responsive system.
  • And we didn’t have to turn ourselves into full-time Postgres tuning experts to get there.

If you’re on RDS, sitting on the default parameter group, and you’ve been putting off tuning because it feels like rocket science... We’d say: let a robot take the first shot.

Top comments (0)