<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: pampatzoglou</title>
    <description>The latest articles on DEV Community by pampatzoglou (@pampatzoglou).</description>
    <link>https://dev.to/pampatzoglou</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F521431%2F7f62a21a-c07d-4a69-a927-d02efe327593.png</url>
      <title>DEV Community: pampatzoglou</title>
      <link>https://dev.to/pampatzoglou</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pampatzoglou"/>
    <language>en</language>
    <item>
      <title>Operational Considerations for Managing Stateful Workloads</title>
      <dc:creator>pampatzoglou</dc:creator>
      <pubDate>Tue, 04 Feb 2025 10:17:04 +0000</pubDate>
      <link>https://dev.to/pampatzoglou/operational-considerations-for-managing-stateful-workloads-20c3</link>
      <guid>https://dev.to/pampatzoglou/operational-considerations-for-managing-stateful-workloads-20c3</guid>
      <description>&lt;p&gt;When managing stateful workloads, whether in Kubernetes or traditional infrastructure, operational concerns like isolation, lifecycle management, security, disaster recovery, scalability, and observability take center stage. While the examples focus on AWS, PostgreSQL, and Kubernetes, the principles and best practices discussed here are broadly applicable to any environment. This article approaches these topics from an  &lt;strong&gt;operations perspective&lt;/strong&gt;, prioritizing reliability, maintainability, and resilience. The goal is not just to run a database, but to ensure it operates efficiently, scales properly, and remains secure in real-world conditions. We’ll explore key aspects of running stateful workloads, from managing failure domains to ensuring observability, and how these impact both operations teams and developers. Whether you’re running a database in a cloud-native setup or on bare metal, these strategies will help you build a robust, well-managed system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table of Contents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Isolation&lt;/li&gt;
&lt;li&gt;Lifecycle management&lt;/li&gt;
&lt;li&gt;Security&lt;/li&gt;
&lt;li&gt;Disaster Recovery&lt;/li&gt;
&lt;li&gt;Scalability&lt;/li&gt;
&lt;li&gt;Prepare for the Ugly&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Stakeholders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive in.&lt;/p&gt;

&lt;h1&gt;
  
  
  Isolation
&lt;/h1&gt;

&lt;p&gt;Let's primarily focus on the isolation options for a database.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Shared Database, Shared Schema&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;All tenants use the same database and the same set of tables.&lt;/li&gt;
&lt;li&gt;Each entry includes a &lt;code&gt;tenant_id&lt;/code&gt; column to segregate data.&lt;/li&gt;
&lt;li&gt;Efficient use of resources is debatable because of the &lt;code&gt;WHERE tenant_id&lt;/code&gt; will add overhead to each query.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simplifies deployment and maintenance.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires strict tenant-aware data access controls. If you implement a ROW-level ACL (implementation follows).&lt;/li&gt;
&lt;li&gt;Certain performance bottlenecks as the tenant count grows.&lt;/li&gt;
&lt;li&gt;Extremely bad If there is a DR scenario:&lt;/li&gt;
&lt;li&gt;All users will be affected as you will need to rollback ALL the database to the last known good state.&lt;/li&gt;
&lt;li&gt;botched operations will be global.&lt;/li&gt;
&lt;li&gt;global recovery increases MTTR.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
One can use Row-Level Security (RLS) in Postgres to have some weak aggregated form of isolation, but it's not isolation. We will dive into this subject as part of security.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Using table prefixes for sharding tenants is a bad idea. &lt;strong&gt;DONT&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Shared Database, Separate Schema&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each tenant has its own schema within a shared database. Table names etc within the schema should be consistent to simplify development.&lt;/li&gt;
&lt;li&gt;Schema-based isolation improves security and performance. From a security perspective, this isolation is considered weak because you need to protect yourself from:

&lt;ul&gt;
&lt;li&gt;Schema Privilege Escalation&lt;/li&gt;
&lt;li&gt;Cross-Schema Injection Risks&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Pros:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Better sharding of data while sharing infrastructure.&lt;/li&gt;
&lt;li&gt;Queries no longer include the &lt;code&gt;WHERE tenant_id&lt;/code&gt; portion which adds overhead to queries.&lt;/li&gt;
&lt;li&gt;Allows per-tenant schema customizations. Avoid if possible.&lt;/li&gt;
&lt;li&gt;Future proof to move to more isolated setups.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Cons:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;More complex migrations and schema updates.&lt;/li&gt;
&lt;li&gt;If not properly managed, restoring one schema could expose data from others due to shared connections.&lt;/li&gt;
&lt;li&gt;Increased administrative overhead, but solvable.&lt;/li&gt;
&lt;li&gt;Still need to have strict database logs and alerts to detect unauthorized access because all users will be working on the same process.&lt;/li&gt;
&lt;li&gt;Complicated backups because each schema should have a dedicated backup process.&lt;/li&gt;
&lt;li&gt;Can't use volume snapshots for backups. They will rever the global state meaning you affect all tenants.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Separate Databases per Tenant, Shared database instance&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each tenant has its dedicated database but all tenants use the same CPUs.&lt;/li&gt;
&lt;li&gt;Ensures complete data isolation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;All benefits for Shared Database, Separate Schema setup plus:&lt;/li&gt;
&lt;li&gt;Strong ACL offers improved security.&lt;/li&gt;
&lt;li&gt;Eases tenant-specific backups and compliance handling.&lt;/li&gt;
&lt;li&gt;Simplest and fastest DR.&lt;/li&gt;
&lt;li&gt;DR is not global.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Cons:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Migration and schema updates are still complex.&lt;/li&gt;
&lt;li&gt;Higher infrastructure and management overhead. Will discuss the solution later in this doc.&lt;/li&gt;
&lt;li&gt;Efficient scaling needs more thought.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Separate databases instance per tenant
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each tenant is running on their own database, probably in their dedicated namespace, or perhaps in a different region, etc. At this level of isolation, it doesn't matter.&lt;/li&gt;
&lt;li&gt;Even stronger isolation because you don't rely only on db-level ACL but also network policies/firewall rules.&lt;/li&gt;
&lt;li&gt;In essence it's a special case of the 3rd option with:&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Even stronger isolation&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capacity planning and optimization overhead will be significant.&lt;/li&gt;
&lt;li&gt;To benefit from this level you need to ensure that the undelaying infrastructure setup is extremely secure so your processes are not exposed to other people running on the same cloud infra eg
&lt;/li&gt;
&lt;/ul&gt;

&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;NodeLaunchTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::EC2::LaunchTemplate&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;LaunchTemplateData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;MetadataOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;HttpPutResponseHopLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;HttpTokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;required&lt;/span&gt;

  &lt;span class="na"&gt;EksCluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::EKS::Cluster&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${AWS::StackName}"&lt;/span&gt;
      &lt;span class="na"&gt;RoleArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;ClusterIamRole.Arn&lt;/span&gt;
      &lt;span class="na"&gt;EncryptionConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
            &lt;span class="na"&gt;KeyArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;ClusterSecretsKMSKey.Arn&lt;/span&gt;
          &lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;secrets&lt;/span&gt;
      &lt;span class="na"&gt;Logging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;ClusterLogging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;EnabledTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;audit&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;authenticator&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;controllerManager&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scheduler&lt;/span&gt;
      &lt;span class="na"&gt;ResourcesVpcConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;EndpointPublicAccess&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;EndpointPrivateAccess&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;KubeAPIServer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;HTTPTokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;required&lt;/span&gt;
        &lt;span class="na"&gt;HTTPPutResponseHopLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In a Kubernetes context you should be using different and unique security contexts per instance, dropped capabilities, and with AppArmor or SELinux, eg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
  &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;runAsUser:1000&lt;/span&gt;
    &lt;span class="s"&gt;runAsGroup:3000&lt;/span&gt;
    &lt;span class="s"&gt;fsGroup:2000&lt;/span&gt;
    &lt;span class="s"&gt;readOnlyRootFilesystem&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="s"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
  &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;runAsUser:1001&lt;/span&gt;
    &lt;span class="s"&gt;runAsGroup:3001&lt;/span&gt;
    &lt;span class="s"&gt;fsGroup:2001&lt;/span&gt;
    &lt;span class="s"&gt;readOnlyRootFilesystem&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="s"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Lifecycle management
&lt;/h1&gt;

&lt;p&gt;Use two different jobs to handle the distinct aspects of the application lifecycle. The first Job runs only on installation; its single responsibility is to create the database with the required parameters. It will run even before installation as we use the &lt;code&gt;pre-install&lt;/code&gt; hook, which means before any application-specific resources are created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;batch/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Job&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "chart.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-db-create-{{ randAlphaNum 5 | lower }}&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helm.sh/hook"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pre-install&lt;/span&gt;
    &lt;span class="s"&gt;"helm.sh/hook-weight"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-20"&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helm.sh/hook-delete-policy"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;before-hook-creation&lt;/span&gt;
    &lt;span class="s"&gt;"argocd.argoproj.io/hook"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PreSync"&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argocd.argoproj.io/hook-delete-policy"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BeforeHookCreation"&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argocd.argoproj.io/job-cleanup"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keep"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-create&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and also define the migrate job that will run after the &lt;code&gt;db-create&lt;/code&gt; and will setup the required schema for the application to function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;batch/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Job&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "chart.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-db-migrate-{{ randAlphaNum 5 | lower }}&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helm.sh/hook"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pre-install,pre-upgrade&lt;/span&gt;
    &lt;span class="s"&gt;"helm.sh/hook-weight"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-10"&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helm.sh/hook-delete-policy"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;before-hook-creation&lt;/span&gt;
    &lt;span class="s"&gt;"argocd.argoproj.io/hook"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sync"&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argocd.argoproj.io/hook-delete-policy"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BeforeHookCreation"&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argocd.argoproj.io/job-cleanup"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keep"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-migrate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the sequence of events should look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghnp21fe5u4893wrmcf4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghnp21fe5u4893wrmcf4.png" alt="Image description" width="800" height="722"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a deeper dive into the inner workings of databases, I highly recommend: &lt;a href="https://www.udemy.com/course/database-engines-crash-course/?couponCode=KEEPLEARNING" rel="noopener noreferrer"&gt;Fundamentals of Database Engineering&lt;/a&gt; paired with &lt;a href="https://www.amazon.com/Database-Internals-Deep-Distributed-Systems/dp/1492040347" rel="noopener noreferrer"&gt;Database Internals&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Security
&lt;/h1&gt;

&lt;p&gt;Security includes multiple aspects that we will approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Access
&lt;/h2&gt;

&lt;p&gt;In a nutshell, access is who can knock on your door, meaning who can reach the port the database listens to. This might be enforced by security groups or network policies etc but in essence, we are talking about firewalls here. Following is an example of a NetworkPolicy (Kubernetes firewall) that allows an app, the vault, and Prometheus to access it. Here we allow Prometheus to run on the monitoring namespace, the replicas from the same namespace, and the vault from the vault namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-db-postgresql-primary-ingress&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/instance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-db&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/component&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;primary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app.kubernetes.io/instance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-db&lt;/span&gt;
      &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql&lt;/span&gt;
      &lt;span class="na"&gt;app.kubernetes.io/component&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;primary&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/instance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-db&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/component&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;kubernetes.io/metadata.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault&lt;/span&gt;
          &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/instance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault&lt;/span&gt;
      &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;kubernetes.io/metadata.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
          &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus&lt;/span&gt;
      &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9187&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Keep in mind that you will probably need to give Grafana or Metabase etc access to your database. A good approach is to do this using the replicas to ensure that a big query doesn't take down prod.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Authentication
&lt;/h2&gt;

&lt;p&gt;Authentication is what happens after someone knocks on your door (port). What is required is for them to prove that they are who they claim to be in an acceptable format. There are two main types of methods to do this but always remember that no matter which approach you select what you want is to have a safe central place where these are controlled. If you choose to do a bit of mix a match, make sure that you have a clear separation of methods with a clear understanding of why. The worst thing you can do here is to leverage multiple ways to authenticate on the same resources because you will lose track. Also to avoid surprises make sure that your authentication methods will work for production requirements and developers.&lt;br&gt;
Personally, I recommend using service authentication between "stuff" that is offered by your cloud provider and your cluster's workloads and credentials between your engineering teams, your applications, and the databases.&lt;/p&gt;
&lt;h3&gt;
  
  
  Authentication through roles
&lt;/h3&gt;

&lt;p&gt;Authentication through roles means that you either trust some services metadata or their serviceAccount to allow them to do a particular action.&lt;/p&gt;
&lt;h3&gt;
  
  
  Authentication through credential
&lt;/h3&gt;

&lt;p&gt;Authentication through credentials means that the service or user can produce a username/password combination that allows you to connect. A bit of a heads-up here is that it's safer to use files as a source of credentials for any application instead of using environmental variables for the simple reason that if someone runs &lt;code&gt;ps env&lt;/code&gt; they will read the entire environment of the container and thus the credentials used. Also, consider that dynamic credentials are far better than static ones. Using dynamic credentials means that there are no credentials to be saved in the code or reused through .env that might be shared between team members.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not try and bend the spoon, that's impossible. Instead, try to realize the truth, there is no spoon.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's get dirty, in reality, your application should have two processes, one that handles the schema through migrations and a second for normal operation. This means that you will have some secrets in the cluster that will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "chart.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-db-migrate-credentials&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;stringData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;DB_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;DB_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "chart.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-db-runtime-credentials&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;stringData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;DB_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;DB_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are several approaches to facilitating this but here I will dive into my favourite.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hashicorp vault
&lt;/h4&gt;

&lt;p&gt;I consider Vault the root user of all databases. It gives access to all of them and with secret engines, eg &lt;a href="https://developer.hashicorp.com/vault/docs/secrets/databases/postgresql" rel="noopener noreferrer"&gt;postgres&lt;/a&gt; it can generate &lt;a href="https://developer.hashicorp.com/vault/docs/secrets/databases/postgresql" rel="noopener noreferrer"&gt;ephimeral credentials&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These credentials can be mapped to IAM roles, or use &lt;a href="https://external-secrets.io/latest/introduction/overview/" rel="noopener noreferrer"&gt;external secrets operator&lt;/a&gt; or vault secrets operator. Whichever approach is in the end, a short-lived username/password is generated that accesses the single database or schema with the required permissions. Keep in mind that these should be different for the process that handles schema (which needs to have elevated access eg to create a table) and for the normal app/role that will be allowed to only read/write/update but never delete or drop. The beauty of this is that from the developer's perspective, it's the same secret (though different values) that the application uses. So they will not need to bother with the overhead of this logic in the application code. The downside of this approach is that when values change, it will require that the pod gets restated which may cascade and have issues with PDB.&lt;/p&gt;

&lt;p&gt;An example generate the aforementioned secrets using the vault and external secrets operator is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vault write database/config/my-postgresql-database &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;plugin_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgresql-database-plugin &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;connection_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgresql://{{username}}:{{password}}@your-db-host:5432/postgres?sslmode=disable"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;allowed_roles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"db-admin, runtime-user"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"vaultuser"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"vaultpassword"&lt;/span&gt;

vault write database/roles/db-admin &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;db_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-postgresql-database &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;creation_statements&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"CREATE ROLE &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';
        GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        GRANT CREATE ON SCHEMA public TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;revocation_statements&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public FROM &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        REVOKE CREATE ON SCHEMA public FROM &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        DROP OWNED BY &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        DROP ROLE &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;default_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"5m"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;max_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"1h"&lt;/span&gt;

vault write database/roles/runtime-user &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;db_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-postgresql-database &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;creation_statements&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"CREATE ROLE &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; WITH LOGIN PASSWORD '{{password}}';
        GRANT CONNECT ON DATABASE mydb TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        GRANT USAGE ON SCHEMA public TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;revocation_statements&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public FROM &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        DROP OWNED BY &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
        DROP ROLE &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;{{name}}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;default_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"30m"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;max_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"1h"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then by using some facilitator service, you can have dynamic secrets where they need to be. The ones most often used are either the vault agent which will use some annotations to setup the credentials for the pod or the external secrets operator which will interact with the vault and generate the Kubernetes secret that will be used. Some more &lt;a href="https://dev.to/breda/dynamic-postgresql-credentials-using-hashicorp-vault-with-php-symfony-go-examples-4imj"&gt;examples&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VaultProvider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-provider&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;vault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-vault-address:8200"&lt;/span&gt;
    &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;tokenSecretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-token&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ExternalSecret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-admin-credentials&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database/creds/db-admin"&lt;/span&gt;  &lt;span class="c1"&gt;# Use the appropriate role for DB Admin or Runtime User&lt;/span&gt;
      &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-vault-address:8200"&lt;/span&gt;
      &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;tokenSecretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-token&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
  &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-provider&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-credentials&lt;/span&gt;
    &lt;span class="na"&gt;creationPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Owner&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_USER&lt;/span&gt;
        &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database/creds/db-admin&lt;/span&gt;
          &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data.username&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD&lt;/span&gt;
        &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database/creds/db-admin&lt;/span&gt;
          &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data.password&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_NAME&lt;/span&gt;
        &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database/creds/db-admin&lt;/span&gt;
          &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data.dbname&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ExternalSecret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-runtime-credentials&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database/creds/db-runtime"&lt;/span&gt;  &lt;span class="c1"&gt;# Use the appropriate role for DB Admin or Runtime User&lt;/span&gt;
      &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-vault-address:8200"&lt;/span&gt;
      &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;tokenSecretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-token&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
  &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-provider&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-credentials&lt;/span&gt;
    &lt;span class="na"&gt;creationPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Owner&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_USER&lt;/span&gt;
        &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database/creds/runtime-user&lt;/span&gt;
          &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data.username&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD&lt;/span&gt;
        &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database/creds/runtime-user&lt;/span&gt;
          &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data.password&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DB_NAME&lt;/span&gt;
        &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database/creds/runtime-user&lt;/span&gt;
          &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data.dbname&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main problem is that by the nature of vault etc, not all of these can be automated. It can be scripted but a person who actually has access to the vault will need to be involved to run the required configurations. Unless you create a custom operator that has admin vault access and performs the required actions to setup the particular engine and paths etc. A better approach is to use the Vault agent to automate secret management.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sb-k8s-template&lt;/span&gt;
      &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/agent-inject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;myapp-k8s-role"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/agent-inject-secret-myapp-db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;myapp-db/creds/myapp-db-role"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/agent-inject-file-secret-myapp-db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;myapp-db.creds"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/auth-path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth/kubernetes"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/agent-run-as-user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1881"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/agent-pre-populate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
        &lt;span class="na"&gt;vault.hashicorp.com/agent-pre-populate-only&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;false"...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whichever method you select, keep in mind that at the end it will mean that the client gets a token/key to use. Basically, a passport that they will then use for authorization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authorization
&lt;/h2&gt;

&lt;p&gt;Authorization is what happens after a user or a process uses their credentials or token and has some form of access. So after a connection has been established and authentication is complete, authorization answers the question: what can I do?&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared Database, Shared Schema
&lt;/h3&gt;

&lt;p&gt;Previously we have mentioned Row-Level Security (RLS). RLS is a method to define in a shared database and shared tables, who can do what. In essence, what you want is to try and block tenant1 from reading or writing entries that belong to tenant2. RLS is a way to achieve this result but at a computational cost for the database. For &lt;strong&gt;each query&lt;/strong&gt; the database will need to check if the cursor has the required access to the particular row and then perform the actual query with the mentioned &lt;code&gt;WHERE tenant_id&lt;/code&gt;. All this will add delay.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;--- Enable Row-Level Security (RLS) on the target table.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;--- Create a Security Policy for Tenants (restrict access to own rows).&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;tenant_row_access&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.current_tenant'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;--- Ensure INSERT operations only allow the correct tenant_id.&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;tenant_insert_policy&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.current_tenant'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;--- Grant basic permissions on the orders table to tenant-specific roles.&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;tenant_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;--- Set the tenant ID in the session (should be done by the application per request).&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.current_tenant'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tenant-123'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;--- Example: Tenant 123 querying the orders table (will only return their own rows).&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;--- Example: Trying to insert an order for another tenant (should fail).&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tenant-999'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Test Order'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nevertheless, even with RLS the two issues remain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It's considered easy to jailbreak.&lt;/li&gt;
&lt;li&gt;You will still need to create a migrations-like process that will generate and update the RLS as you add new tenants.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I would suggest experimenting with using &lt;code&gt;EXPLAIN&lt;/code&gt; and &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on your data to see how it works on your system. Depending on the scale, some organizations will be happy to accept the RLS overhead, and for some, it will be a significant price that will drive them to adopt a different approach. From experience what tends to happen is that organizations start by using RLS and at some point start to split their clients into higher tiers where multitenancy is offered.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared Database, Separate Schema or better
&lt;/h3&gt;

&lt;p&gt;Once you have stopped using Shared Database with Shared Schema the authorization portion becomes rather simple and fast. This is because you now have access rules that are applied once when you start the cursor to the database and it will no longer need to be re-calculated for each query the cursor makes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Disaster Recovery
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Disaster Recovery (DR)&lt;/strong&gt; is a set of policies, tools, and procedures designed to restore IT infrastructure and operations after a disaster (e.g., cyberattacks, hardware failures, natural disasters, or human errors). It ensures business continuity by minimizing downtime and data loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Components of Disaster Recovery&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Backup &amp;amp; Restore&lt;/strong&gt; – Regular data backups to on-site, off-site, or cloud storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disaster Recovery Plan (DRP)&lt;/strong&gt; – A documented strategy outlining recovery steps, responsibilities, and timelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recovery Time Objective (RTO)&lt;/strong&gt; – Maximum acceptable downtime before services must be restored.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recovery Point Objective (RPO)&lt;/strong&gt; – Maximum acceptable data loss measured in time (e.g., last backup timestamp).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In my opinion, a common misconception is that HA setups offer DR. While they can be handy in some cases for example by mitigating restarts, etc through failovers and redundancy, replicas should not be considered disaster recovery sources for the simple reason that they might be also corrupted or lost. Disaster recovery should mean that you have a path from being completely owned. In a nutshell, HA prevents downtime; DR recovers from disaster.&lt;/p&gt;

&lt;p&gt;In practice your first concern is backup. Either you use a cronjob to perform a series of dumps or volume snapshots by the database or through Velero, you will be doing something similar to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;backup.velero.io/backup-volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backup&lt;/span&gt;
  &lt;span class="na"&gt;post.hook.restore.velero.io/command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;["/bin/bash",&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"-c",&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"[&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;\"/scratch/backup.sql\"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PGPASSWORD=$POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;psql&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-U&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$POSTGRES_USER&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-h&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-d&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$POSTGRES_DATABASE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/scratch/backup.sql&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/scratch/backup.sql;"]'&lt;/span&gt;
  &lt;span class="na"&gt;pre.hook.backup.velero.io/command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;["/bin/bash",&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"-c",&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"PGPASSWORD=$POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pg_dump&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-U&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$POSTGRES_USER&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-d&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$POSTGRES_DATABASE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-h&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/scratch/backup.sql&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mkdir&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-p&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/bitnami/postgresql/backups&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mv&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/scratch/backup.sql&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/bitnami/postgresql/backups"]'&lt;/span&gt;
  &lt;span class="na"&gt;pre.hook.backup.velero.io/timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can define these as complex as required using the RTO and RPO as definite guides. What you really need to pay special attention to is WHERE these are saved and WHY it's safe. To be honest, here the only real solution for this is to use some form of &lt;a href="https://aws.amazon.com/blogs/aws/glacier-vault-lock/" rel="noopener noreferrer"&gt;vault lock&lt;/a&gt; with WORM capability. For example, Amazon S3 supports WORM (Write Once Read Many) functionality through S3 Object Lock, which allows you to store objects in a way that they cannot be changed or deleted after they have been written. This feature is useful for regulatory compliance and data protection. In practice, this means that even if the ROOT account of the cloud provider gets compromised the backups will be safe. You will need to also configure lifecycle policies to ensure that costs don't skyrocket. Even if you are running on-prem consider using something like &lt;a href="https://aws.amazon.com/privatelink/" rel="noopener noreferrer"&gt;AWS PrivateLink&lt;/a&gt; to link your local network and store your backups on the cloud.&lt;/p&gt;

&lt;p&gt;Again keep in mind that here isolation is your best friend. If you restore a database you restore a database. This means that ALL your clients will be affected. Thus it's very important to use divide and conquer here, meaning create backup and recovery plans that will allow you to perform a partial recovery for only the affected clients. This might mean that you dump all the database in the vault but allow your ops team to set up a partial restore process for the particular clients. Just have these defined and run recovery drills to verify you meet your RTO and RPO.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flsr1f8iujjvr73mncjg6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flsr1f8iujjvr73mncjg6.png" alt="Image description" width="800" height="793"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a bit more clarity regarding DRPs. A plan is always a good thing and keeping it documented is even better. Documenting a DRP means that you don't just add words in a doc file. I mean that you define strategies and tactics. I mean that you define decision points and have ready recovery scripts. You might do these in bash you might do these in ansible, it doesn't matter. What matters is that you:&lt;br&gt;
    * Know if you need to start a recovery.&lt;br&gt;
    * Know what is still OK and what you need to recover or otherwise what's the blast radius.&lt;br&gt;
    * Know what point in time you need to recover from.&lt;br&gt;
    * Know that you have tested these in the past so you don't end up in a deeper hole.&lt;/p&gt;

&lt;p&gt;Some solutions worth exploring for DR are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://velero.io/" rel="noopener noreferrer"&gt;Velero&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stash.run/" rel="noopener noreferrer"&gt;Stash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.kasten.io/latest/" rel="noopener noreferrer"&gt;Kasten K10&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Scalability
&lt;/h1&gt;

&lt;p&gt;This entire setup considers a single database that can be used for read and write. You can also create different instances and "schedule" your tenants to them to have even balances. But sooner or later you will probably need to leverage different paths for read and write. This means replication, and replication means eventual consistency. Ignoring for now the eventual consistency logic, the architecture for this is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a high-availability setup. You can use leader election or other strategies, but for simplicity let's assume that the PRIMARY is defined. The optimal approach here is to create read REPLICAS ensuring with anti-affinity that each read replica lives in a different AZ. This is especially important considering that block storage volumes are AZ-locked and can't easily migrate between zones.&lt;/li&gt;
&lt;li&gt;Then you have the issue of query routing. Here again, two main approaches work well, depending if you prefer to pay the development overhead or leverage a standard solution:

&lt;ul&gt;
&lt;li&gt;Use a read and a write connection string, and have the application create different connections that produce a read and a read_write cursor to the database. Then the application explicitly selects what to use for each case.&lt;/li&gt;
&lt;li&gt;Use a service like pgPool that will act like a reverse proxy to the database and depending on the query type (selection or update) will route the query to the correct instance type. For a deeper dive visit &lt;a href="https://www.mydbops.com/blog/load-balancingquery-routing-using-pgpool-ii-part-ii" rel="noopener noreferrer"&gt;Load Balancing/Query Routing using PGPOOL-II&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Personally, I like the latter approach as it gives this power to the infrastructure teams which are more aware of what runs where etc. You might want to further optimize connectivity by using a way to enable service topology-aware routing between apps and pgPool but also between pgPool and database instance (&lt;a href="https://aws.https://aws.amazon.com/blogs/containers/exploring-the-effect-of-topology-aware-hints-on-network-traffic-in-amazon-elastic-kubernetes-service/amazon.com/blogs/containers/exploring-the-effect-of-topology-aware-hints-on-network-traffic-in-amazon-elastic-kubernetes-service/" rel="noopener noreferrer"&gt;Exploring the effect of Topology Aware Hints on network traffic in Amazon Elastic Kubernetes Service&lt;/a&gt;), eg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pgpool&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;service.kubernetes.io/topology-aware-hints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;topologyKeys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topology.kubernetes.io/zone"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Database High Availability (HA) and Split Brain Risk
&lt;/h1&gt;

&lt;p&gt;In a database high availability (HA) setup that relies on long-running connections, upgrading and migrating to new nodes can introduce the risk of a split-brain scenario. This occurs when multiple nodes operate independently, believing they are the primary, leading to data inconsistency and potential loss of transactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenario Description
&lt;/h2&gt;

&lt;p&gt;During an upgrade, the following events may unfold:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A primary database node is migrated or replaced.&lt;/li&gt;
&lt;li&gt;Long-running connections on existing nodes persist, unaware of the topology change.&lt;/li&gt;
&lt;li&gt;A new node is introduced as the new primary, but stale connections may still interact with the old node.&lt;/li&gt;
&lt;li&gt;Both old and new nodes process write operations independently.&lt;/li&gt;
&lt;li&gt;Data divergence occurs due to concurrent writes on both nodes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Potential Causes of Split Brain
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-lived client connections:&lt;/strong&gt; Clients may not detect topology changes and continue writing to an old node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network partitioning or delays:&lt;/strong&gt; Temporary network issues can cause nodes to operate independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure in leader election mechanisms:&lt;/strong&gt; If multiple nodes incorrectly assume leadership, they can both accept writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lack of connection draining:&lt;/strong&gt; If client connections are not explicitly closed before failover, they may write to an outdated node.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Mitigation Strategies
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection Draining:&lt;/strong&gt; Ensure all active connections are gracefully closed before promoting a new node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated Failover Testing:&lt;/strong&gt; Simulate failover scenarios before deploying upgrades.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict Fencing Mechanisms:&lt;/strong&gt; Implement strict write fencing to prevent stale nodes from accepting writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short-lived Connections:&lt;/strong&gt; Design clients to use short-lived or automatically re-established connections after failovers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is a sequence diagram illustrating a split-brain scenario during an upgrade and how it will get resolved either by replicas understanding that they are in split-brain and re-establishing connectivity or through a controller that takes over and handles the recovery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2udic91r144z2gox8hty.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2udic91r144z2gox8hty.png" alt="Image description" width="474" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If this setup seems complex, that's because it is. Especially if you consider that you might have multiple replicas which might also need to get scheduled on new nodes etc. In many ways, it resembles the &lt;a href="https://en.wikipedia.org/wiki/Tower_of_Hanoi" rel="noopener noreferrer"&gt;Tower of Hanoi&lt;/a&gt; puzzle—careful sequencing is key. That's why it's better to leverage some tools to help you out. Especially with Postgres, there is an operator that will act as Controller (&lt;a href="https://cloudnative-pg.io/" rel="noopener noreferrer"&gt;cloudnativePG&lt;/a&gt;) that will offer a great deal of help running operations. In the past, I got some very good results from Patroni.  Also,o this is why you need either short-lived connections from your applications or some code to handle these potential errors. If this sounds like a high-risk scenario (and it probably is for most organizations) I would suggest starting with what you are already using, a managed database, and treating it as your primary. Then build a replication system in your cluster. But always keep in mind that you are no longer in ACID land, &lt;strong&gt;you are now in eventual consistency land&lt;/strong&gt;. What has happened is that you traded downtime for ACID and managed to maintain eventual consistency. Make sure that your engineering recognizes this as an architectural reality and business realizes that "&lt;a href="https://knowyourmeme.com/memes/one-does-not-simply-walk-into-mordor" rel="noopener noreferrer"&gt;One Does Not Simply Walk Into Mordor&lt;/a&gt;".&lt;/p&gt;

&lt;p&gt;Recommendation: Even If you don't manage to get the buy-in to change the entire architecture of your application, start by creating a single replica for departments that don't need real-time data, like CS, etc. Use the replicas to get some hands-on experience with business intelligence pipelines feeling happy that a huge query will not nuke your database and affect your customers.&lt;/p&gt;

&lt;p&gt;The book &lt;a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321" rel="noopener noreferrer"&gt;Designing Data-Intensive Applications&lt;/a&gt; is an amazing source to understand how to handle state.&lt;/p&gt;

&lt;h4&gt;
  
  
  Short or long-lived connections
&lt;/h4&gt;

&lt;p&gt;This is one of the core architectural dilemmas in system design. On one hand, short-lived connections allow you to fail fast and increase the probability of retrieving the most relevant data. However, they come with the overhead of frequently establishing new connections. This cost can be substantial, especially when using SSL/TLS, as it requires an expensive asymmetric encryption handshake to generate a session’s symmetric key. This key might then be used for several minutes—or even just a single query—before the connection is closed. On the other hand, long-lived connections avoid this repeated overhead but introduce complexities, such as handling reconnections, detecting stale or split-brain database replicas, and managing connection pooling effectively. &lt;/p&gt;

&lt;p&gt;The right choice depends on your specific use case. However, at a minimum, you should accept the SSL/TLS overhead for the most critical database connections—especially those handling schema changes. These operations are relatively rare but use high-privilege credentials, making security a top priority. Additionally, since database replication typically relies on long-lived connections, ensure these connections are also secured with SSL/TLS. For your application layer, the decision depends on your exact workload and data requirements. Still, I recommend using SSL/TLS for database cursors handling write operations. It’s fascinating how seemingly “small” non-functional requirements can shape fundamental architectural decisions.&lt;/p&gt;

&lt;h1&gt;
  
  
  Prepare for the Ugly: Ensuring Database Stability in Kubernetes
&lt;/h1&gt;

&lt;p&gt;Running databases in any infrastructure involves the risk of failure. Kubernetes acknowledges these risks as part of life and offers ways to define how to react when these happen. To mitigate these risks, you need to configure Kubernetes resources, Quality of Service (QoS), and pod priority settings correctly to ensure stability and recoverability. Don't just hope for the best, define what happens when the ugly is at your doorstep.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Resource Requests and Limits for Databases
&lt;/h2&gt;

&lt;p&gt;Databases are resource-intensive and sensitive to performance degradation, so setting appropriate CPU and memory requests/limits is crucial, especially keeping in mind that we will be handling state, which in reality removes elasticity. Dynamically adding replicas will increase the load on our system when scaling is required, as the new instances will need to be synced. But before diving deep, let's first define some good approaches to get reliable values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Considerations
&lt;/h3&gt;

&lt;p&gt;Memory allocation for databases is critical since insufficient memory can lead to excessive swapping, slowing down performance, while over-allocating can waste valuable resources. Key factors to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Working Set Size (WSS):&lt;/strong&gt; Measure the actively used memory by the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffer Cache Usage:&lt;/strong&gt; Monitor how efficiently the database caches data in memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak Load Analysis:&lt;/strong&gt; Identify memory spikes during high traffic periods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Garbage Collection &amp;amp; Memory Fragmentation:&lt;/strong&gt; For databases like PostgreSQL and MongoDB, factor in memory management behaviors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;How to Determine Memory Resources&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Monitor &lt;code&gt;container_memory_working_set_bytes&lt;/code&gt; over time.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;quantile_over_time(0.5, container_memory_working_set_bytes{namespace="%s", container="%s"}[6h])&lt;/code&gt; to estimate a safe &lt;strong&gt;request value&lt;/strong&gt;. You might want to use a higher quantile, but this will affect cost.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;quantile_over_time(0.999, container_memory_working_set_bytes{namespace="%s", container="%s"}[6h])&lt;/code&gt; to estimate a safe &lt;strong&gt;limit value&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Ensure that you calculate over some time with the actual load.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CPU Considerations
&lt;/h3&gt;

&lt;p&gt;Databases often require consistent CPU resources, and unlike stateless applications, CPU throttling can significantly impact query performance and latency.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline Utilization:&lt;/strong&gt; Identify normal CPU usage using &lt;code&gt;sum(rate(container_cpu_usage_seconds_total[5m]))&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency-Sensitive Workloads:&lt;/strong&gt; If high query performance is required, avoid setting strict CPU limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Processing:&lt;/strong&gt; For multi-threaded databases (e.g., PostgreSQL), ensure enough CPU cores are available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling Considerations:&lt;/strong&gt; Vertical scaling (increasing CPU per instance) is often more effective than horizontal scaling due to sync overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;How to Determine CPU Resources&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;quantile_over_time(0.5, node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="%s", container="%s"}[24h])&lt;/code&gt; to find a safe threshold.&lt;/li&gt;
&lt;li&gt;Consider setting requests at the &lt;strong&gt;90th percentile&lt;/strong&gt; to prevent throttling while ensuring efficient resource allocation.&lt;/li&gt;
&lt;li&gt;Set CPU limits using &lt;code&gt;quantile_over_time(0.999, node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="%s", container="%s"}[6h])&lt;/code&gt; to prevent excessive resource usage while allowing headroom for bursts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Disk I/O Considerations
&lt;/h3&gt;

&lt;p&gt;Databases rely heavily on disk performance, and slow I/O can severely degrade performance. Key aspects include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read vs. Write Operations:&lt;/strong&gt; Analyze workload characteristics (e.g., OLTP vs. OLAP).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disk Latency:&lt;/strong&gt; Use metrics like &lt;code&gt;node_disk_read/write_latency_seconds&lt;/code&gt; to assess disk performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Throughput:&lt;/strong&gt; Monitor &lt;code&gt;rate(node_disk_bytes_read/write_total[5m])&lt;/code&gt; to determine IOPS requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSD vs. HDD:&lt;/strong&gt; Use SSDs for low-latency and high-throughput workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;How to Determine Disk I/O Requirements&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Measure &lt;strong&gt;average and peak disk utilization&lt;/strong&gt; over time.&lt;/li&gt;
&lt;li&gt;Ensure  &lt;strong&gt;sustained disk throughput meets database requirements&lt;/strong&gt; .&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use provisioned IOPS storage&lt;/strong&gt; (e.g., AWS EBS gp3/io1) for consistent performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Profile Database Workloads:&lt;/strong&gt; Different databases (MySQL, PostgreSQL, MongoDB) have unique performance characteristics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Under Load:&lt;/strong&gt; Simulate real traffic to fine-tune resource requests and limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor &amp;amp; Adjust:&lt;/strong&gt; Continuously monitor resource usage and adjust based on observed trends.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By following these best practices, we can ensure stable, efficient, and performant database deployments without over-allocating resources or causing unexpected failures due to under-provisioning. Just make sure you are using some block storage for the actual state so that if a node dies and your workloads are rescheduled, their disks will follow them. Also, keep in mind that disks are AZ-locked.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. QoS Class
&lt;/h2&gt;

&lt;p&gt;Kubernetes assigns QoS classes based on resource settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guaranteed: Assigns the highest priority and prevents eviction due to resource pressure. Requires CPU and memory requests to match limits.&lt;/li&gt;
&lt;li&gt;Burstable: Allows flexibility but risks eviction under high cluster load. Suitable for less critical workloads.&lt;/li&gt;
&lt;li&gt;BestEffort: Most vulnerable to eviction. Not recommended for databases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recommendation: Run the exercise and calculate the values for CPU and Memory. Then start with the primary and assign resources that will have a Guaranteed workload. This means calculating the limit values for both CPU and Memory, giving some extra 20%, or at least rounding up those numbers to have the headroom and use the same exact numbers for the request values.&lt;/p&gt;

&lt;p&gt;Replicas are simpler because you will have a number of them and you should start thinking of them as cattle. These should run as burstable workloads and you want to cut costs here. So use the exact values you got from the previous exercise.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Pod Priority and Preemption
&lt;/h2&gt;

&lt;p&gt;When the cluster runs out of resources, Kubernetes evicts lower-priority pods first. To protect database pods, you should leverage a PriorityClass with a high value to ensure database pods are scheduled before lower-priority workloads. In reality, what we are doing here is defining how the cluster will do triage once the building is on fire. When PriorityClass is useful the "ugly" is not at our doorstep, it walked in the house. At this point it's not IF workloads will stop but WHICH workload has a higher probability of staying functional. And as always with probabilities dice are going to get rolled. What we are doing here is stacking the dice to try and minimize the fallout.&lt;br&gt;
Example of a high-priority class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scheduling.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PriorityClass&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database-critical&lt;/span&gt;
&lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1000000&lt;/span&gt;
&lt;span class="na"&gt;globalDefault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Priority&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;critical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;workloads"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Assign the priority class to database deployments, keeping in mind that the primary instance is more critical than replicas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;priorityClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database-critical&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consider here that your PRIMARY pods are more important than your REPLICAS, and also acknowledge that probably not all your tenants are equal. These aspects of reality should be reflected here.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Anti-Affinity and Pod Disruption Budgets
&lt;/h2&gt;

&lt;p&gt;Node and Zone Anti-Affinity: Spread database pods across nodes to avoid single points of failure.&lt;/p&gt;

&lt;p&gt;To ensure high availability and resilience for database pods, you can use node anti-affinity and zone anti-affinity rules in Kubernetes. This prevents all database pods from being scheduled on the same node or within the same availability zone, reducing the risk of downtime due to node or zone failures.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;podAntiAffinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-database&lt;/span&gt;
              &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kubernetes.io/hostname"&lt;/span&gt;  &lt;span class="c1"&gt;# Ensures pods are spread across different nodes&lt;/span&gt;
          &lt;span class="na"&gt;preferredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
              &lt;span class="na"&gt;podAffinityTerm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-database&lt;/span&gt;
                &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topology.kubernetes.io/zone"&lt;/span&gt;  &lt;span class="c1"&gt;# Prefer different zones&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Node and Zone Anti-Affinity: Spread database pods across nodes to avoid single points of failure.&lt;/p&gt;

&lt;p&gt;Pod Disruption Budget (PDB): Ensures at least one database pod remains available during voluntary disruptions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PodDisruptionBudget&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-pdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;minAvailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By carefully configuring Kubernetes resources, QoS, priority, and graceful shutdown mechanisms, you can prepare for the eventuality of the  "ugly" and minimize disruptions to your database workloads.&lt;/p&gt;

&lt;h1&gt;
  
  
  Observability
&lt;/h1&gt;

&lt;p&gt;Sample Prometheus rule regarding a generic app database that can be used as a reference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PrometheusRule&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-postgresql&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/component&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metrics&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-postgresql&lt;/span&gt;
      &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlDown&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql instance is down VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql down (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pg_up{namespace="backend", pod=~"app-postgresql-.*"} == &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PrimaryPostgresqlRestarted&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql restarted VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql restarted (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;time() - process_start_time_seconds{namespace="backend", pod="app-postgresql-primary-0"} &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlRestarted&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql restarted VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql restarted (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;time() - process_start_time_seconds{namespace="backend", pod=~"app-postgresql-.*"} &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlRunningOutConnections&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Available VALUE  VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Number of available connections less than 10% (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(((sum(pg_settings_max_connections{namespace="backend", pod=~"app-postgresql-.*"}) by (server) - sum(pg_settings_superuser_reserved_connections{namespace="backend", pod=~"app-postgresql-.*"}) by (server)) - sum(pg_stat_activity_count{namespace="backend", pod=~"app-postgresql-.*"}) by (server)) / sum(pg_settings_max_connections{namespace="backend", pod=~"app-postgresql-.*"}) by (server)) * 100 &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;component&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlExporterError&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql exporter is showing errors. A query may be buggy in query.yaml VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql exporter error (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pg_exporter_last_scrape_error{namespace="backend", pod=~"app-postgresql-.*"} &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlTableNotAutoVacuumed&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Table {{ $labels.relname }} has not been auto vacuumed for 10 days VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql table not auto vacuumed (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(pg_stat_user_tables_last_autovacuum{namespace="backend", pod=~"app-postgresql-.*"} &amp;gt; 0) and (time() - pg_stat_user_tables_last_autovacuum{namespace="backend", pod=~"app-postgresql-.*"}) &amp;gt; 60 * 60 * 24 * &lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlTableNotAutoAnalyzed&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Table {{ $labels.relname }} has not been auto analyzed for 10 days VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql table not auto analyzed (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(pg_stat_user_tables_last_autoanalyze{namespace="backend", pod=~"app-postgresql-.*"} &amp;gt; 0) and (time() - pg_stat_user_tables_last_autoanalyze{namespace="backend", pod=~"app-postgresql-.*"}) &amp;gt; 24 * 60 * 60 * &lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlTooManyConnections&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQL instance has too many connections (&amp;gt; 80%). VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql too many connections (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (instance, job, server) (pg_stat_activity_count{namespace="backend",pod=~"app-postgresql-.*"}) &amp;gt; min by (instance, job, server) (pg_settings_max_connections{namespace="backend",pod=~"app-postgresql-.*"} * 0.8)&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlNotEnoughConnections&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQL instance should have more connections (&amp;gt; 5) VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql not enough connections (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (datname) (pg_stat_activity_count{namespace="backend",pod=~"app-postgresql-.*", datname!~"template.*|postgres|readme_to_recover"}) &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlDeadLocks&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQL has dead-locks VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql deadlocks (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;increase(pg_stat_database_deadlocks{namespace="backend",pod=~"app-postgresql-.*", datname!~"template.*|postgres|readme_to_recover"}[1m]) &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlHighRollbackRate&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ratio of transactions being aborted compared to committed is &amp;gt; 2  VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql high rollback rate (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by (namespace,datname) ((rate(pg_stat_database_xact_rollback{namespace="backend",pod=~"app-postgresql-.*",datname!~"template.*|postgres|readme_to_recover",datid!="0"}[3m]))) / (((rate(pg_stat_database_xact_rollback{namespace="backend",pod=~"app-postgresql-.*",datname!~"template.*|postgres",datid!="0"}[3m])) + (rate(pg_stat_database_xact_commit{datname!~"template.*|postgres",datid!="0"}[3m])))) &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0.02&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlCommitRateLow&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql seems to be processing very few transactions VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql commit rate low (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate(pg_stat_database_xact_commit{namespace="backend",pod=~"app-postgresql-.*",datname!~"template.*|postgres|readme_to_recover"}[1m]) &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlReplicationLag&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql replica is lagging (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pg_replication_lag_seconds{namespace="backend",pod=~"app-postgresql-.*"} &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlTooManyDeadTuples&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQL dead tuples is too large VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql too many dead tuples (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;((pg_stat_user_tables_n_dead_tup{namespace="backend",pod=~"app-postgresql-.*"} &amp;gt; 5)) / (pg_stat_user_tables_n_live_tup{namespace="backend",pod=~"app-postgresql-.*"} + pg_stat_user_tables_n_dead_tup{namespace="backend",pod=~"app-postgresql-.*"}) &amp;gt;= &lt;/span&gt;&lt;span class="m"&gt;0.1&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlTooManyLocksAcquired&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Too many locks were acquired on the database. If this alert happens frequently, we may need to increase the postgres setting max_locks_per_transaction. VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql too many locks acquired (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;((sum (pg_locks_count{namespace="backend",pod=~"app-postgresql-.*"})) / (pg_settings_max_locks_per_transaction{namespace="backend",pod=~"app-postgresql-.*"} * pg_settings_max_connections{namespace="backend",pod=~"app-postgresql-.*"})) &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0.20&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlLatency&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgres is running slow. VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql is lagging over 1 second (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pg_stat_activity_max_tx_duration{namespace="backend",pod=~"app-postgresql-.*",state="active"} &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlCacheHitRate&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgres cache hit rate is very low. VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql is lagging over 1 second (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100 * (rate(pg_stat_database_blks_hit{namespace="backend",pod=~"app-postgresql-.*"}[2m]) / ((rate(pg_stat_database_blks_hit{namespace="backend",pod=~"app-postgresql-.*"}[2m]) + rate(pg_stat_database_blks_read{namespace="backend",pod=~"app-postgresql-.*"}[2m]))&amp;gt;0)) &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;80&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlMemoryAvailable&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgres is using over 0.8 of available memory. VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql running out of available memory (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum by(pod,container)(container_memory_usage_bytes{namespace="backend",pod=~"app-postgresql-.*", container!="POD",container!=""}) / sum by(pod,container)(kube_pod_container_resource_limits{namespace="backend",pod=~"app-postgresql-.*",resource="memory"}) &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0.8&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgresqlRequestedBufferCheckpoints&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQL uses the buffer checkpoints to write the dirty buffers on disk, so it creates safe points for the Write Ahead Log (WAL). These checkpoints are scheduled periodically but also can be requested on-demand when the buffer runs out of space. A high number of requested checkpoints compared to the number of scheduled checkpoints can impact directly the performance of your PostgreSQL instance. To avoid this situation you could increase the database buffer size. VALUE = {{ $value }} LABELS = {{ $labels }}&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Postgresql is over using write buffer (instance {{ $labels.instance }})&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate(pg_stat_bgwriter_checkpoints_req_total{namespace="backend",pod=~"app-postgresql-.*"}[5m]) / (rate(pg_stat_bgwriter_checkpoints_req_total{namespace="backend",pod=~"app-postgresql-.*"}[5m]) + rate(pg_stat_bgwriter_checkpoints_timed_total{namespace="backend",pod=~"app-postgresql-.*"}[5m])) * 100 &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0.8&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you need some business-related logic to be exposed and alert on, then you can use the metrics exporter and create a custom metric, eg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;pg_database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Name of the database&lt;/span&gt;
      &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LABEL&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;size_bytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Size of the database in bytes&lt;/span&gt;
      &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GAUGE&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SELECT pg_database.datname, pg_database_size(pg_database.datname) as bytes&lt;/span&gt;
    &lt;span class="s"&gt;FROM pg_database;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and then create an alert based on the particular metric. Try to keep in mind that these are queries that will run on the database every time Prometheus queries the exporter. So think carefully, do you need to have exact values or estimates? Leverage &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; to verify what you are doing and take a minute to think if this query MUST run on your primary database or on a replica.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;pg_table&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Name of the table&lt;/span&gt;
      &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LABEL&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;row_count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Estimated number of rows in the table&lt;/span&gt;
      &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GAUGE&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;SELECT relname AS name, reltuples::bigint AS row_count&lt;/span&gt;
    &lt;span class="s"&gt;FROM pg_class&lt;/span&gt;
    &lt;span class="s"&gt;JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid&lt;/span&gt;
    &lt;span class="s"&gt;WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND relkind = 'r';&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Estimate vs. Exact: Uses reltuples from pg_class, which generate an estimation based on the last ANALYZE. This avoids a full COUNT(*), which might be expensive.&lt;br&gt;
Performance Impact: This query is fast since it avoids scanning entire tables.&lt;br&gt;
Primary vs. Replica: Ideally, run on a replica if available, as stats are not real-time but good enough for monitoring.&lt;/p&gt;
&lt;h1&gt;
  
  
  Stakeholders
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Business: Justify, Estimate, and Monitor Costs
&lt;/h2&gt;

&lt;p&gt;When working with business stakeholders, your main concerns will likely revolve around &lt;a href="[link](https://insights.roozbeh.ca/premature-scaling-a-challenge-in-enterprise-readiness-459d7a483654)"&gt;Premature Scaling&lt;/a&gt; and &lt;a href="[link](https://insights.roozbeh.ca/infrastructure-cost-leverage-icl-09246939bf44)"&gt;Infrastructure Cost Leverage&lt;/a&gt;. To align with their priorities: Justify why each step is necessary. Provide cost estimations, especially based on CPU and resource requirements. Demonstrate cost monitoring strategies to prevent budget overruns. Remember, business teams are balancing financial resources across multiple departments. They see infrastructure as an investment—make sure they understand the return. Explain how all these contribute to their feature matrix.&lt;/p&gt;
&lt;h2&gt;
  
  
  Developers: Observe, Adapt, and Abstract
&lt;/h2&gt;

&lt;p&gt;Don't assume how developers work—observe, verify, and validate before introducing changes. To be an effective DevOps engineer: Understand their existing workflows before making recommendations. Build internal tooling that simplifies their processes. Abstract complexity, but recognize that it still exists—hiding it doesn't mean it disappears. Your goal is to empower developers with seamless tooling while ensuring operational efficiency under the hood.&lt;/p&gt;
&lt;h3&gt;
  
  
  Tools and more examples to work with developers.
&lt;/h3&gt;

&lt;p&gt;devbox: Help developers set up their environment in a constant manner.&lt;br&gt;
Justfile: Help standardize actions, especially the ones with friction. A good example of this is generating dynamic credentials for your engineering teams, consider the following solution that uses BitWarden to fetch some secrets that will be used for the  next steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Default recipes list&lt;/span&gt;
@default:
    just &lt;span class="nt"&gt;--list&lt;/span&gt;

&lt;span class="c"&gt;# Check if Bitwarden is unlocked and prompt for unlock if needed&lt;/span&gt;
&lt;span class="c"&gt;# Returns the BW_SESSION token&lt;/span&gt;
_bw:
    &lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
    &lt;span class="c"&gt;# bw config server https://vault.bitwarden.eu&lt;/span&gt;
    &lt;span class="nv"&gt;vault_status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;bw status | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.status'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$vault_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"locked"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔐 Bitwarden vault is locked. Attempting to unlock..."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
        &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-sp&lt;/span&gt; &lt;span class="s2"&gt;"Enter your Bitwarden password: "&lt;/span&gt; BW_PASSWORD
        &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
        &lt;span class="nv"&gt;BW_SESSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;bw unlock &lt;span class="nt"&gt;--raw&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BW_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
            &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Failed to unlock Bitwarden"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
            &lt;span class="nb"&gt;exit &lt;/span&gt;1
        &lt;span class="k"&gt;fi
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Bitwarden vault unlocked successfully"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="k"&gt;else
        &lt;/span&gt;&lt;span class="nv"&gt;BW_SESSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;bw unlock &lt;span class="nt"&gt;--check&lt;/span&gt; &lt;span class="nt"&gt;--raw&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Bitwarden vault is already unlocked"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="k"&gt;fi
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BW_SESSION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Export AWS credentials from Bitwarden&lt;/span&gt;
_aws:
    &lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
    &lt;span class="nv"&gt;BW_SESSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;just _bw&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;creds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;bw list items &lt;span class="nt"&gt;--session&lt;/span&gt; &lt;span class="nv"&gt;$BW_SESSION&lt;/span&gt; &lt;span class="nt"&gt;--folderid&lt;/span&gt; 425498e0-b0d7-4c7e-a27e-b26500ad72cd &lt;span class="nt"&gt;--search&lt;/span&gt; aws&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$creds&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No AWS credentials found in Bitwarden"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
        &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi
    &lt;/span&gt;&lt;span class="nv"&gt;region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$creds&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[0] | .fields[] | select(.name == "region") | .value'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;access_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$creds&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[0] | .fields[] | select(.name == "aws_access_key_id") | .value'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;secret_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$creds&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[0] | .fields[] | select(.name == "aws_secret_access_key") | .value'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$region&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$access_key&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$secret_key&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Failed to extract AWS credentials from Bitwarden item"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
        &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"export AWS_DEFAULT_REGION=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$region&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"export AWS_ACCESS_KEY_ID=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$access_key&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"export AWS_SECRET_ACCESS_KEY=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$secret_key&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example showcases the principle. The developer as a user does not need to save the credentials in their system to interact. Instead, a human-oriented method is used to fetch credentials for the next steps as required. Don't try to stop people from saving credentials. Engineers won't need &lt;code&gt;.env&lt;/code&gt; files if they have a seamless way to access credentials.&lt;/p&gt;

&lt;h1&gt;
  
  
  Material:
&lt;/h1&gt;

&lt;p&gt;The following resources can't be recommended enough.&lt;br&gt;
&lt;a href="https://www.amazon.com/Database-Internals-Deep-Distributed-Systems/dp/1492040347" rel="noopener noreferrer"&gt;https://www.amazon.com/Database-Internals-Deep-Distributed-Systems/dp/1492040347&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321" rel="noopener noreferrer"&gt;https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.amazon.com/Software-Architecture-Trade-Off-Distributed-Architectures/dp/1492086894" rel="noopener noreferrer"&gt;https://www.amazon.com/Software-Architecture-Trade-Off-Distributed-Architectures/dp/1492086894&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/@DevOpsToolkit" rel="noopener noreferrer"&gt;https://www.youtube.com/@DevOpsToolkit&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.udemy.com/course/database-engines-crash-course" rel="noopener noreferrer"&gt;https://www.udemy.com/course/database-engines-crash-course&lt;/a&gt;&lt;br&gt;
&lt;a href="https://samber.github.io/awesome-prometheus-alerts/rules.html" rel="noopener noreferrer"&gt;https://samber.github.io/awesome-prometheus-alerts/rules.html&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  PS
&lt;/h1&gt;

&lt;p&gt;Please dev.to add support for &lt;code&gt;mermaid sequenceDiagram&lt;/code&gt;...&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>cloudnative</category>
      <category>database</category>
    </item>
  </channel>
</rss>
