I recently upgraded our analytics database at The Org to PostgreSQL 13.1 in order to support periodically refreshing materialized views using pg_cron to run the jobs directly in the database. Our previous setup involved CloudWatch events that triggered a Lambda to run a refresh command.
Upgrading to PostgreSQL 12.5+
Amazon RDS supports pg_cron from version 12.5 and higher (source). If you are running an older version you will need to update it.
You can check the version in the AWS control panel or by running:
SELECT version();
If you are running version 12.5 or newer, just skip to the next section.
Otherwise, you need to update your database. Navigate to your DB instance and set the DB engine version to PostgreSQL 13.1-R1 to update it.
The next step will require you to update your parameter group which also requires a reboot. It can be beneficial to do both of these steps at once to reduce downtime.
Updating a major version will take 10-20 minutes and make your database temporarily unavailable. I will not cover strategies for avoiding downtime here.
Update the parameter group
Go to parameter groups and change the value of shared_preload_libraries to include pg_cron. You can not update default parameter groups.
If you are using the default.postgres13 parameter group then create a new parameter group with Parameter group family set to postgres13. This creates a duplicate of the template that you can then modify.
Navigate to the RDS instance and click modify. Set the DB parameter group under additional configurations to use the group you just created. Skip this step if you simply modified an existing parameter group.
Apply the changes. This requires a quick reboot of the database.
Install pg_cron
Connect to your database as a rds_superuser. The default admin account (usually called postgres) is part of this user group per default. Once connected run the following SQL in the postgres database:
CREATE EXTENSION pg_cron;
If successful you should be able to see a schema in the postgres database named cron containing a job and job_run_details table. Common errors if unsuccessfully includes:
ERROR: pg_cron can only be loaded via shared_preload_libraries
The parameter group was not properly updated. Ensure you updated the right parameter group and applied the change to the database. The database reboot might be pending.
ERROR: permission denied to create extension "pg_cron"
The user you authenticated as does not have permission to install pg_cron. Ensure you are using an account with admin privileges. Both postgres and rdsadmin should do.
ERROR: can only create extension in database postgres
You connected to the wrong database. Ensure you are running your command against the postgres database.
Add a Cron Job using pg_cron
Once pg_cron is installed you can start scheduling SQL commands using the following syntax:
SELECT cron.schedule(SCHEDULE, COMMAND);
SELECT cron.schedule(JOB_NAME, SCHEDULE, COMMAND);
These cron jobs will run in the postgres database context per default. Our use case for using pg_cron is to periodically update a materialized view so we need to update the database in which the command is executed. This is done by first scheduling the job and then updating the database column in the cron.job table.
SELECT cron.schedule(
'Refresh weekly analytics',
'5 0 * * 1',
$$REFRESH MATERIALIZED VIEW web_client.company_weekly_analytics$$
)
UPDATE cron.job SET database = 'theorg' WHERE jobid = 1
This will run the command each Monday at 0:05 in our main database.
Crash course in pg_cron
The concept behind pg_cron is fairly simple. Scheduled jobs are stored in the cron.job table with the command to execute and when. The schedule is expressed as a regular cron expression - you can use crontab.guru for generating correct cron expression.
/* See all scheduled jobs. */
SELECT * FROM cron.job
/* Query to confirm that is was successful and check running times. */
SELECT * FROM cron.job_run_details
/* Schedule a job */
SELECT cron.schedule(JOB_NAME, SCHEDULE, COMMAND);
/* Unschedule a job */
SELECT cron.unschedule(JOB_ID);
/* Update cron expression - this would run daily at 3:14 */
UPDATE cron.job SET schedule = '14 3 * * *' WHERE jobid = 1
Each run will add an entry to the cron.job_run_details table. Query it to confirm that your cron job got executed successfully.
Disclaimer
Consider if running a cron job directly in your database is the right solution in your use case. It can clutter up your database, drain resources and hide business-critical information from your codebase — potentially resulting in some really nasty debugging sessions for your colleagues.
I would personally not recommend running clean-ups or data modifying jobs directly in the database. Our use case is ensuring our materialized views (which have to be materialized for performance reasons) stay updated. Our alternative approach was using CloudWatch events to periodically trigger an AWS lambda that connected to the database and refreshed the view.
Running a Lambda to refresh the views adds cost, complexity and involves more moving parts that can potentially fail. That approach is not exactly obvious either as you have to know the lambda exists and is triggered periodically. Using pg_cron is the lesser evil in our case.
Top comments (0)