Kyle Johnson for Flagsmith

Posted on Jan 28

Five Open Source tools to track your PostgreSQL database performance

#devops #grafana #flagsmith #prometheus

Written by Eddie Jaoude (see outro)

How is your PostgreSQL (aka Postgres) database performing in production after releasing that latest shiny new feature? If you don’t measure its performance then you will never know and the best you can hope for is an educated guess. Do you want to rely on guesswork to decide whether you should be rolling out this new feature to your users?

In this blog post I will take you through the steps to achieve/create a dashboard that monitors a PostgreSQL database. The dashboard will also be annotated when we enable/disable features. This is an effective way of monitoring the impact that enabling a new feature has on a database.

All tools that I refer to in this blog post are Open Source and free:

NextJS for our app https://nextjs.org/
PostgreSQL as our database https://www.postgresql.org/
PostgreSQL Exporter to expose metrics -https://github.com/prometheus-community/postgres_exporter
Prometheus to collect metrics https://prometheus.io/
Grafana to display metrics https://grafana.com/
Flagsmith for feature flag management https://www.flagsmith.com/

Prerequisites

So that you can follow along please make sure that you have the following:

Docker
Flagsmith (I will use their free cloud account, but you can also host it yourself on Docker).

Architecture

I will be using the following architecture setup, however you can move any items between any of the layers to better suit your situation:

Host machine
- Our Application (NextJS)
- K6 to simulate users
Docker
- PostgreSQL database
- PostgreSQL Exporter
- Prometheus
- Grafana
Cloud
- Flagsmith (you can run this in Docker if you prefer)

PostgreSQL

For this blog I will be using the SQL database Postgres. Postgres is a popular Open Source relational database that is super robust and stable with decades of experience. I find it has a great balance of the latest trending features without compromising stability. I have used Postgres in projects for startups to UK banks, and it handles a variety of different data and performance situations without skipping a beat.

If you already have a PostgreSQL database then please feel free to skip this section.

This Docker command will create a PostgreSQL 14 database. Set the username and password as well as expose the port on our local machine:

docker run --name postgres14 -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -p 5432:5432 -d postgres:14

Create a small private network for all our Docker containers to communicate together:

docker network create postgres-network

Then add the new PostgreSQL database to this private network. This is achieved by using the container name we created before:

docker network connect postgres-network postgres14

Then you can inspect the network to get the PostgreSQL database’s IP with this command and specify the name of the network to inspect:

docker network inspect postgres-network

In the JSON output there will be a “Container” section that lists the containers in the network and also their respective IP. You will need the PostgreSQL IP in the next section.

"Containers": {
            "4469e0602d83ac7fa2bd70ecbbcb34c41911cc342506c0f0a67f71365a61d54a": {
                "Name": "postgres14",
                "EndpointID": "5c8a26405db3598f472e9dcb4217b7b0f1d117974a5e2981e301bcda850a99e1",
                "MacAddress": "02:42:ac:12:00:04",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": ""
            }
        }
}

PostgreSQL Exporter

Now we have a database, we need to get the performance metrics out from it. The PostgreSQL Exporter is the industry standard way to gather PostgreSQL metrics for Prometheus consumption. It is a stable project that has been around for many years and is still getting new features added all the time. At the time of writing this “Multi-Target Support” has been added as a beta feature. With plenty of content and use cases out there, you will not be short of examples to match your situation.

To get the Docker container running we need to pass some more parameters. In addition to the Docker container name and exposing the port, specify the PostgreSQL database IP (you can get this from the end of the previous section using the Docker inspect command), the PostgreSQL database username and password.

docker run --name postgres-exporter -p 9187:9187 -e DATA_SOURCE_URI="172.18.0.2:5432/postgres?sslmode=disable" -e DATA_SOURCE_USER=user -e DATA_SOURCE_PASS=password -d quay.io/prometheuscommunity/postgres-exporter

As before, add the new container to the private Docker network with:

docker network connect postgres-network postgres-exporter

Test the PostgreSQL Exporter is working by visiting this url in the browser or using curl on your host machine:

curl "http://localhost:9187/metrics"

You can learn more about the PostgreSQL Exporter in their GitHub repository https://github.com/prometheus-community/postgres_exporter

Next inspect the private network again for the PostgreSQL Exporter IP, as this will be needed in the following section:

docker network inspect postgres-network

Prometheus

At this stage we have a database and we have the performance metrics, but we need to store the collected data over time. Prometheus is the go to Open Source tool for “metrics to insight” built with monitoring and alerting in mind. It stores the data in a time series format and has many integrations with popular tools and platforms. With great documentation, an active GitHub repo (50k+ Stars), and a large community, it is no surprise that Prometheus is the go to tool.

Although Prometheus has a big ecosystem it is straightforward to get started. All we will need is a small YAML file and one Docker command.

Create a YAML file called “prometheus.yml”. You can keep this anywhere you prefer, but I have chosen to do so in my “Downloads” folder (full path “~/Downloads/prometheus.yml”).

In this file, first set global configs for “scrape_interval” and “evaluation_interval”. Note these global configs can be overwritten at the job level if required when having multiple jobs.

scrape_interval: defines how often the monitor is scraped - you can change this to your preference, I have selected 15 seconds here.
evaluation_interval: defines how often queries are evaluated for alerting.

It is recommended in most cases to have both these global configs set at the same value.

Below the global configs, we specify the jobs under “scrape_configs”, this is made up of the following:

job_name: you can call this the name you prefer, it will show in your Prometheus web page of targets (screenshot below)
static_configs
- targets: this is the IP and port to scrape of the “Postgres Exporter” in this example. Use the IP from the last Docker command in the previous section.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "postgres-exporter"
    static_configs:
      - targets: ["172.18.0.3:9187"]

Now to run the Prometheus Docker container to use the YAML config file just created:

docker run --rm --name prometheus -p 9090:9090 -v ~/Downloads/prometheus.yml:/etc/prometheus/prometheus.yml -d prom/prometheus

The last step in this section is to add our new Prometheus container to the private network we created. Use the following command to do this is:

docker network connect postgres-network prometheus

Then visit http://localhost:9090/targets and you will see the job and the state of “UP”:

Note: If you stop the Prometheus container, it will no longer have access to the private network. Therefore you will need to add it back to the network created so it has access to the other containers. If you have restarted other containers also check none of their container IPs have changed, as these might need updating in the config files and also network.
You can also learn more about Prometheus on their official website: https://prometheus.io/docs/introduction/overview/

Grafana

Now we have our Postgres performance metrics stored, we need to visualise the data. Grafana visualises, queries and alerts on our data from multiple sources including Prometheus as well as NoSQL/SQL databases. With Grafana being so popular and customizable, it has a large community with pre-existing templates to kickstart our Grafana dashboard and then we can customize as needed.

Grafana can run on Docker with one command, which uses port 3000. I will map this externally to port 3001 on my host machine because my NextJS is already using port 3000:

docker run -d -p 3001:3000 --name=grafana grafana/grafana-enterprise

Add the new container to our network:

docker network connect postgres-network grafana

Visit the Grafana Dashboard http://localhost:3001 in the browser. You can log in with the username “admin” and the password “admin” - you will be prompted to update the password, but if running Grafana locally you can click the “skip” button for now.

Firstly open the side menu and expand “Connections” and click “Data source”. Then you can click “Add data source” for Prometheus.

Select Prometheus and you will need to add the Prometheus URL. You can get the IP by inspecting the Docker network again:

docker network inspect postgres-network

Only the Prometheus URL is required (don’t forget to add the port). The other fields are optional and can be updated later if required:

Now there is a data source, a great place to start a Grafana Dashboard is from an existing template and customise it further from there.

Visit your Grafana Dashboard on http://localhost:3001/dashboard/import and paste in the templated Grafana Dashboard URL https://grafana.com/grafana/dashboards/455-postgres-overview/ and then click “load”. A new form will appear and you can rename the dashboard, but the important part is to select a Prometheus source.

My imported Grafana Dashboard looks like this:

List your running Docker containers with this command and you should see the following containers running:

docker ps

If you want to read more about Grafana then check out their website: https://grafana.com/docs/grafana/latest/

Flagsmith

We have all been there where our project needs some feature flags and we think we are being smart by moving them to a config file or environment variable. In reality we still need to deploy to make changes and the developers/devops in the team will need to make those changes. Plus they are not instant and usually require a full redeploy.

To manage my feature flags, I don’t like to reinvent the wheel, so I use the Open Source project Flagsmith and their cloud platform. Flagsmith has more features than I could ever create, plus it allows me to ship faster and more confidently by decoupling my deployments and releases as well as other great features that allow me to release to specific users or groups of users before everyone.

With Flagsmith we can not only toggle this feature on/off but we can also expose it to specific people or groups of people.

Sign up

First start by signing up to Flagsmith's free account https://www.flagsmith.com/.

Once you sign up / log in, you will create an “Organisation” and then a “Project”:

Create a feature flag

Click the “Create Feature” button at the top right.
Give the feature a name. One thing to take into consideration is that you cannot change the name later.
Leave “Enabled by default” off.

“Value” is not needed for our feature as we will use enable/disable only. Value would be used for controlling the content. For example when to show a banner and the content of the banner.
You can also give a “Description”, but note that you can change this later.

Create a feature flag

Flagsmith has many integrations so do make sure that you check out their great documentation.

To use Flagsmith with NextJS, add it as an npm dependency with the command:

npm i flagsmith --save

Create a NextJS provider in file “src/app/providers.js”:

"use client";

import { FlagsmithProvider } from "flagsmith/react";
import { createFlagsmithInstance } from "flagsmith/isomorphic";
import { useRef } from "react";

export default function Providers({ serverState, children }) {
  const flagsmithInstance = useRef(createFlagsmithInstance());

  return (
    <FlagsmithProvider
      flagsmith={flagsmithInstance.current}
      serverState={serverState}
    >
      <>{children}</>
    </FlagsmithProvider>
  );
}

Open the “src/app/layout.js” file and in the import section add:

import { createFlagsmithInstance } from "flagsmith/isomorphic";
import Providers from "./Providers";

Then in the RootLayout function add (don’t forget to make the function async):

const flagsmith = createFlagsmithInstance();
await flagsmith.init({
  environmentID: process.env.NEXT_PUBLIC_FLAGSMITH_ID,
});
const serverState = flagsmith.getState();
console.log(serverState); // just to check the data is coming through

Wrap the “body” in the provider:

<Providers serverState={serverState}>
  <body className={inter.className}>{children}</body>
</Providers>

Lastly add your “NEXT_PUBLIC_FLAGSMITH_ID” to the “.env” file, which you can find in the “SDK Keys” menu on the left of your Flagsmith Dashboard.

Your “.env” file should look something like this:

NEXT_PUBLIC_FLAGSMITH_ID=3C5QvbtgZ3C5Qvbxyz

Let’s run the app and check it all works together as expected:

npm run dev

There should be no visual change in the UI, but in the terminal there will be a further output from Flagsmith because of our console.log.

In my terminal I have the following output which contains the feature flags I created in Flagsmith. (Currently I only have one feature flag, if there were more you would see them in the output also.)

 ✓ Compiled in 101ms (496 modules)
{
  api: 'https://edge.api.flagsmith.com/api/v1/',
  environmentID: '3C5QvbtgZh3C5QvbtgZh',
  flags: {
    stats: {
      id: 98466,
      enabled: false,
      value: ''
    }
  },
  identity: undefined,
  ts: null,
  traits: {},
  evaluationEvent: null
}

Use Feature Flag

On my homepage, I am thinking of showing some near real-time statistics but I am concerned about the impact. By wrapping it in a Flagsmith Feature Flag, I can enable it for my user, a group of users or for everyone, as well as disable it if required. I have created a Stats component that I will include on the homepage, and in the Stats component it will check for the Feature Flag “stats” status.

"use client";

import { useFlags } from "flagsmith/react";

export default function Stats() {
  const { stats } = useFlags(["stats"]);
  if (!stats.enabled) {
    return null;
  }

  return (
    // ...
  );
}

Now we can show/hide these stats from the Flagsmith Dashboard.

Integrate with Grafana

It is important to know when we toggle on/off features in our app what the direct impact on the application’s performance is. Yes we could try and guess what happened when, but this is neither efficient nor accurate. It is much better to integrate Flagsmith with Grafana, so that it actually puts a marker (called a “Grafana Annotation”) on the graph for us.

The integration is straightforward. However we need to expose our Grafana instance to the internet so Flagsmith in the cloud can communicate with it.

If you are not using Flagsmith cloud and your Flagsmith instance is local to your Grafana or your Grafana is on the internet already (because you are using Grafana cloud), then you can skip this section.

There are multiple ways to achieve this, but my preferred way is to use an npm package which results in a total of using two commands:

Install the npm package “localtunnel” globally with the command:

npm install -g localtunnel

Tell localtunnel what port to expose to the internet and forward to my local instance:

lt --port 3001

This command will return us a URL that can be used for the integration on Flagsmith.It is therefore important to remember this URL as it will be needed for the next step on the Flagsmith’s Dashboard.

Get a “service account token” from your Grafana.This can be done by opening the side menu and clicking “Users and access”, then “Service accounts”. Then give the service account a name.

In the newly created service account, create a token.

Copy this token to your clipboard as you will need it in the next step on the Flagsmith Dashboard.

Go to the Flagsmith Dashboard and click on “Integrations”:

Then scroll down to “Grafana” and click on “Add Integration”. Add the URL that was generated in the terminal and the “service account token” previously copied.

Now you will see the integration is complete.

The final step is to display the Flagsmith feature flag changes on the Grafana Dashboard. We will need to edit the Grafana Dashboard we imported earlier to add the annotation as it occurs.

Go back to the Grafana Dashboard created earlier and press “Edit”.

In the Grafana Dashboard settings click the “Annotations” tab and then “Add annotation query”.

Give the annotation a name and select “Grafana” for the data source. For the filter drop down select “Tags” and select the tags you are interested in - I selected my whole Flagsmith project.

Save the annotation changes to the Grafana Dashboard and exit edit mode, returning to the main Grafana Dashboard view. You will now have a “Flagsmith” toggle at the top of the Grafana Dashboard to show/hide the annotations.

With this toggle enabled, each time changes are made on the Flagsmith Dashboard they will appear on the Grafana Dashboard. I will demonstrate this shortly but first, I will generate some user load to make the graphs more interesting.

k6

Simulating and creating user traffic can be tricky. There are many tools out there and I have tried and used many over the years. Recently I came across k6. This is a load testing tool created by Grafana. It is designed to be developer friendly, from browser performance testing to chaos and resilience testing. k6 has packages for the popular operating systems as well as a Docker container and a standalone binary.

Installation

I will install k6 on my Mac with brew:

brew install k6

Script

In k6’s Getting Started guide, they have some great example scripts to get started. I have taken their “Ramp VUs up and down in stage” and modified them a little for my needs.

The script has the imports at the top as you would expect, followed by the test configuration. In the configuration there are an array of “stages” where each item is an object of “duration” and “target”. In this “stages” array you can ramp up and down the users at specific times.

After exporting the config options, export a default function with the URL k6 will make the request against and then validate the response, in this case we are expecting a status code of 200.

I will add the following k6 script to a file called “script.js” in the root of the project:

import http from "k6/http";
import { check, sleep } from "k6";

// Test configuration
export const options = {
  thresholds: {
    // Assert that 99% of requests finish within 3000ms.
    http_req_duration: ["p(99) < 3000"],
  },
  // Ramp the number of virtual users up and down
  stages: [
    { duration: "30s", target: 100 },
    { duration: "1m", target: 50 },
    { duration: "3m", target: 30 },
    { duration: "20s", target: 0 },
  ],
};

// Simulated user behavior
export default function () {
  let res = http.get("http://localhost:3000");
  // Validate response status
  check(res, { "status was 200": (r) => r.status == 200 });
  sleep(1);
}

To run the above k6 script, use the command:

k6 run script.js

The output is:

While k6 is running, we can toggle our Flagsmith feature on/off and see the effect it has on the application. Each time changes on the Flagsmith’s Dashboard are made the annotation appears and we can hover the annotating line to get more information.

Now you can test in production and make an informed decision as to whether you should roll out the new feature to everyone.

Troubleshooting

You might find yourself having connection issues between the containers during setup. Or the setup initially worked but after restarting one or more containers or your host machine, it no longer works. I found that this is usually due to IP changes, which caught me out multiple times before.

Note that you might not get any visible errors in the logs, but no data will be coming through the workflow. Check the IP of each container and make sure it is in the private network and the relative YAML config files are updated.

Conclusion

What effect does your latest feature have on your app’s health? If you can’t quantify the impact, how do you know if you should or should not roll it out to all your users? There is no need to build scripts and tooling to capture then measure the performance of your app before and after changes. This is complex and time consuming, especially when you could be using this time to focus on your app. There are so many Open Source tools that you can use and integrate together to help you measure performance and improve your product, allowing you to focus on what you can deliver and the benefits to your users. Plus if you don’t want to host and manage these tools most of them have cloud versions you can also use.

Having a strong foundation for your project will help you make more informed decisions rather than guessing.

When you find yourself in the situation when something happened to your production app and you are trying to understand the “how” and “why” while debugging the situation, if you have not captured the historic information, then this becomes very difficult.

Key takeaways:

1: You can’t improve what you don’t measure. Measure before and after the changes so you can compare the impact these changes have on your project.
2: Be proactive. Simulate user load and see how your product performs; one user is different to 1000 users.
3: Use feature flags to safely and confidently test in production without affecting your users, which will allow you to make more accurate decisions.

Be proactive with all your changes, know the impact, and be confident to deploy on a Friday!

About the Author

Eddie is passionate about technologies and products which contribute towards a developer's work becoming more efficient and high quality.

Eddie uses his 15+ years of Fullstack expertise to offer personalised consulting on Open Source best practices, as well as creating technical content to bring awareness, review and teach his clients' latest products and services, to the tech community. To name a few areas, Eddie has a keen interest in Javascript/Typescript, API, databases and DevOps.

https://www.eddiejaoude.io/

http://eddiejaoude.substack.com/links