Ivan Porollo

Posted on Feb 8, 2022 • Edited on Feb 18, 2022

Setting up data monitoring for PostgreSQL

#postgres #python #opensource #tutorial

Data quality and reliability are still a source of headaches for data organizations today. Monosi exists to resolve the issues that teams face.

Monosi is an open source data observability and monitoring platform for data teams (see Monosi Github). It is used to quickly set up monitors on a data store to run checks for data quality issues and alert on detected anomalies.

This article will walk you through how to get started monitoring a PostgreSQL database in less than 10 minutes with Monosi.

Pre-requisites

For this tutorial, we are going to use PostgreSQL as our database to monitor. We are going to use a public Postgres instance that is available online from RNAcentral.

Installing Monosi (CLI)

There are two ways of using Monosi - a CLI and a web interface. We will first run through the CLI workflow, then showcase the web UI.

Monosi's CLI is a Python package that is distributed through pypi.

Note: Monosi CLI is only compatible with Python 3.6 and higher.

To install it, open your terminal and run:

pip install monosi

Check that it's been installed by running:

monosi --version

Configuring a data source

With Monosi installed, we need to configure a data source to monitor. Monosi reads connection details from the ~/.monosi/workspaces.yml file, so let's create it:

mkdir ~/.monosi
touch ~/.monosi/workspaces.yml

Edit the workspaces.yml file in your editor of choice and fill it out with the public database information. This is what the file will look like:

default:
    sources:
        default:
            type: postgres
            user: reader
            password: NWDMCE5xdipIjRrp
            host: hh-pgsql-public.ebi.ac.uk
            port: 5432
            database: pfmegrnargs

You can find more information on setting up the workspaces.yml file here.

Creating a Monosi project

Navigate to the directory where you want your Monosi project to live and create a project repository by running:

monosi init

In the directory you should now see a folder called monosi-repo (you can rename this if you want). Navigate into the folder by running:

cd monosi-repo

In the monosi-repo directory, you should see a monosi_project.yml file.

This file configures which connection to use and your monitor paths, as well as other metadata. More information on setting up the monosi_project.yml file can be found here.

For the purposes of this tutorial, we don't need to edit the file.

Creating a monitor

Monosi automatically creates a folder called monitors in the monosi-repo directory. In that folder, an example custom SQL monitor is defined to show how the syntax works.

The example public database is populated with 40+ tables. To take a look at how the schema is structured, click here. For the purposes of this example, we will create a monitor on one of the tables, specifically auth_user. To do so, run:

touch ./monitors/auth_user.yml

Edit the auth_user.yml file in your editor of choice and paste the following:

monosi:
  monitors:
  - name: pfmegrnargs.rnacen.auth_user - Table Health
    description: Monitoring the health of the auth_user table
    type: table
    table: pfmegrnargs.rnacen.auth_user
    timestamp_field: last_login

This creates a monitor as code on the provided auth_user table. It monitors for table health metrics.

Run the monitors

Start the monitor by running the following command in the monosi-repo directory:

monosi run

It takes a few seconds to run. The resulting output should be similar to:

🎉 Congratulations, you just ran your first Monosi monitor! From the output of the run, you should see that the public data on the auth_user table has no anomalies that the Monosi monitor has detected.

Installing Monosi (Web Interface)

Note: At the moment, the web interface is decoupled from the CLI, so any monitors that are created in the CLI will not be synced.

With the release of v0.0.3, Monosi has a web user interface. For a full walkthrough of the UI, watch the video here.

Monosi provides a docker image to run the web application. If you're not familiar with docker, follow the tutorial here. Make sure you have docker installed and ready to use by running:

docker ps

To get the Monosi UI up and running with docker, run the following command:

docker run -p 3000:3000 monosi/monosi

Navigate to http://localhost:3000 and you will see the Monosi UI.

Set up a connection to the public PostgreSQL instance in the UI by navigating to http://localhost:3000/settings/sources.

Fill out the database connection form with the following values:

Name for Data Source: RNAcentral Public Database
User: reader
Password: NWDMCE5xdipIjRrp
Host: hh-pgsql-public.ebi.ac.uk
Port: 5432
Database: pfmegrnargs

After setting up the connection, create a monitor in the UI by navigating to http://localhost:3000/monitors and clicking the Create Monitor button. Fill out the form with the following information:

Name: auth_user table health
Description: auth_user table health monitor
Check every: 720 minutes
Monitor type: Table Health
Data Source: RNAcentral Public Database
Table: pfmegrnargs.rnacen.auth_user
Timestamp Field: last_login

Hit save and your new monitor will appear in the monitors table.

🎉 Congratulations, you've just set up and scheduled a data monitor! This process will run indefinitely until you delete the monitor. To get alerts on detected anomalies, set up a Slack connection in http://localhost:3000/settings/integrations

Wrapping up

Now that you’ve worked through an example using a public PostgreSQL instance, you can further extend this to your own data store. For more information, get started here.

If you have any questions, join our Slack community or open an issue in our repository on Github.

DEV Community