loading...
Cover image for Have you VACUUMed your tables lately?

Have you VACUUMed your tables lately?

helenanders26 profile image Helen Anderson Originally published at helenanderson.co.nz ・3 min read

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Dishing the dirt on VACUUM

VACUUM is one of the tasks your DBA need to perform to keep things running smoothly in a Postgres database.

This is a high-level overview of why this is important.


Multiversion Concurrency Control

The first thing to get to grips with is the concept of Multiversion Concurrency Control (MVCC). The Postgres docs say that this is a way to protect “reading from blocking writing and writing from blocking reading“, by providing each transaction with a snapshot of the data.

While this is great when you’re querying your data, no one gets locked. The trade-off is that all these snapshots float around only marked for deletion.

They are not automatically deleted. These are called ‘dead tuples’, ‘dead rows’ or ‘bloat’ because they clog things up and slow queries down.


Alt Text


How does this work?

VACUUM

Using VACUUM is a way to deal with the garbage collection that needs to be done to keep things running smoothly.

This option only clears out the unused data but doesn’t rewrite to disk.

VACUUM [tablename]  
-- to specify a table

VACUUM [tablename.columnname]  
-- to specify a table and column

VACUUM FULL

This is the most no-nonsense way to get the job done. The tradeoff is that VACUUM FULL puts a full lock on the table. Not a great option if you have anyone trying to SELECT as you carry this task out.

VACUUM(FULL) [tablename]
-- to specify a table

VACUUM(FULL) [tablename.columnname] 
-- to specify a table and column

VACUUM ANALYZE

This clears out unused data and updates the query plan.

VACUUM(FULL, ANALYZE) [tablename] 
-- to specify a table

VACUUM(FULL, ANALYZE) [tablename.columnname] 
-- to specify a table and column

AUTOVACUUM

This option keeps things under control automatically by using a trigger to kick off VACUUM when it reaches a certain level. When it is exceeded the VACUUM begins.

vacuum base threshold +

vacuum scale factor *

number of tuples

autovacuum_vacuum_threshold = 50 
-- the threshold of 50 rows is set to prevent 
small tables being overcleaned

autovacuum_vacuum_scale_factor = 0.2 
-- 20% of a table may have dead tuples

So I should clean up all the time?

The aim here is to keep disk space usage at a steady-state, not to knock it down to a minimum.

Not necessarily.

Performing these tasks uses resources, and depending on which strategy you have in mind, could lock users out while it happens.

The advantage of AUTOVACUUM is that is throttled, so it doesn’t use all your resources.


How do I know when it last ran?

Using the query below you can check on when the process ran by using the pg_stat_user_tables table:

select 
  relname, 
  last_vacuum, 
  last_autovacuum 
from
pg_stat_user_tables;

To tell if AUTOVACUUM is running you can use this query using the pg_stat_activity table:

select
  datname, 
  usename, 
  pid, 
  state, 
  wait_event, 
  current_timestamp - xact_start AS xact_runtime, 
  query
from 
  pg_stat_activity 
where 
  upper(query) LIKE '%VACUUM%' 
order by 
  xact_start;

Your friendly DBA will be across the details when it comes to this process and other maintenance tasks. Hopefully, this gives you a good introduction to the concepts and demystifies why you may need to log out so the VACCUUMing can be done.


Useful Links:

Postgres documentation – Vacuum
Postgres documentation – Autovacuum
Amazon RDS documentation


Read more


This post originally appeared on helenanderson.co.nz

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Posted on by:

helenanders26 profile

Helen Anderson

@helenanders26

Making applications go faster at Raygun, AWS Data Hero, and tag moderator on Dev.to. Database concept you don’t understand? Let me know, I’ll write a post!

Discussion

markdown guide
 

Postgres' auto vacuum settings are good enough. At some point in time, when your database grows with plenty data you might want to look into the auto vacuum behavior of each individual table.
For each table to can override de default settings to more suitable values for the way that table is used. For example, a small table with a lot of writes you probably want to reduce the auto vacuum threshold so that it's not done every few seconds or minutes.
In some cases you might even want to disable auto vacuum and schedule a crib job to explicitly vacuum those tables outside of peek ours. Even though auto vacuum tries is non blocking and runs on a low priority, it will produce a lot of IO, which affects disc buffers, and if the storage is non-local also network traffic.

I can't really go into the details for tuning the auto vacuum settings, as I have not really done this. It's on my to-do list, but so far our production servers are running along nicely.

Tuning the auto vacuum settings is performance optimization. So the standard rules apply:

  1. Don't.
  2. Don't Yet (for experts only