Team TigerData for Tiger Data (Creators of TimescaleDB)

Posted on Jan 10

7 Ways to Fix PostgreSQL Database Bloat

#postgres #database #devops #tutorial

If your queries are slowing down or your database is eating up too much storage, there are simple actions to fix it. Here are 7 ways you can help improve database performance and save on storage costs.

1. Use Columnar Compression

PostgreSQL has some compression mechanisms, but if you want to take data compression even further, especially for time-series data, you should use Timescale’s columnar compression.

💡 How Timescale Compression Works

It allows you to dramatically compress data through a provided add_compression_policy() function. To achieve high compression rates, Timescale uses various compression techniques depending on data types to reduce your data footprint.

As an example:

Let’s say we have a hypertable with a week's worth of data. Imagine that our application generally only needs data from the last day, but we must keep historical data around for reporting purposes. We could run SELECT add_compression_policy('my_table', INTERVAL '24 hours'); which automatically compresses rows in the my_table hypertable older than 24 hours.

Timescale’s compression would combine all the rows into a single row, where each column contains an array of all the row's data in segments of 1,000 rows. Visually, this would take a table that looks like this:

| time                   | location | temperature |
|------------------------|----------|-------------|
| 2023-09-20 00:16:00.00 | garage   | 80          |
| 2023-09-21 00:10:00.00 | attic    | 92.3        |
| 2023-09-22 00:5:00.00  | basement | 73.9        |

And compress it down to a table like this:


| time                                                                     | location                    | temperature               |
|--------------------------------------------------------------------------|-----------------------------|---------------------------|
| [2023-09-20 00:16:00.00, 2023-09-21 00:10:00.00, 2023-09-22 00:5:00.00]  | [garage, attic, basement]   | [80, 92.3, 73.9]          |

2. Monitor dead tuples

Dead tuples waste storage space in PostgreSQL tables. Install and use the pgstattuple extension to identify table bloat and track cleanup needs:

CREATE EXTENSION IF NOT EXISTS pgstattuple;

If you run the following query,

SELECT * FROM pgstattuple('my_table');

Postgres would give you a table of helpful information in response:

 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent 
-----------+-------------+-----------+---------------+-----------------+----------------+--------------------+------------+--------------
  81920000 |      500000 |  40000000 |          48.8 |           10000 |        1000000 |                1.2 |     300000 |          0.4

table_len tells you how big your table is in bytes, including data, indexes, toast tables, and free space.
dead_tuple_len tells how much space is being occupied by dead tuples which can be reclaimed by vacuuming.
free_space indicates the unused space within the allocated pages of the table.. Take note that free_space will reset for every new page created.

You can also perform calculations or transformations on the result to make the information more understandable. For example, this query calculates the ratios of dead tuples and free space to the total table length, giving you a clearer perspective on the storage efficiency of your table:

SELECT
(dead_tuple_len * 100.0 / table_len) as dead_tuple_ratio,
(free_space * 100.0 / table_len) as free_space_ratio
FROM
pgstattuple('my_table');

3. Run autovacuum for storage savings

PostgreSQL's autovacuum removes dead tuples and reclaims space. Instead of modifying global settings in postgresql.conf, configure it per table:

ALTER TABLE my_table SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 200)

This will update my_table to have autovacuum run after 200 tuples have been updated or deleted.

More information about additional autovacuum settings are in the PostgreSQL documentation.

4. Reclaim unused pages with pg-repack

When autovacuum isn't enough, VACUUM FULL can reclaim unused pages - but it locks your tables. For production databases, use pg_repack instead. This PostgreSQL extension cleans up table bloat with minimal locking by creating a fresh copy of your table, then swapping it with the original. The result: minimum storage space used with minimal downtime.

5. Find unused indexes

As we mention in this article on indexing design, over-indexing is a frequent issue in many large PostgreSQL databases. Indexes consume disk space, so removing unused or underutilized indexes will help you keep your PostgreSQL database lean.

You can use pg_stat_user_indexes to spot opportunities:

SELECT
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS index_scan_count
FROM
pg_stat_user_indexes
WHERE
idx_scan < 50 -- Choose a threshold that makes sense for your application.
ORDER BY
index_scan_count ASC,
pg_relation_size(indexrelid) DESC;

(This query looks for indexes with fewer than 50 scans, but this is an arbitrary number. You should adjust it based on your own usage patterns.)

6. Arrange columns by data type (from largest to smallest)

PostgreSQL aligns data types in memory based on their size, which can create unused space between columns. Ordering columns from largest to smallest data type in your table definition minimizes this wasted space.

This practical tip will help you minimize wasted space. Learn more about optimizing database schema design here.

7. Delete old data regularly

Most apps don't need to store data forever. You should always ask yourself: how long should I keep data around?

Setting up data retention policies is essential for managing storage appropriately. TimescaleDB (a PostgreSQL extension) can automatically delete old data using retention policies.

You can create a retention policy by running:

SELECT add_retention_policy('my_table', INTERVAL '24 hours');

In this snippet, Timescale would delete chunks older than 24 hours from my_table.

Key Takeaways

Table bloat wastes storage space and slows query performance. Implement these techniques to keep your PostgreSQL database lean and your database cost low:compress historical data, set retention policies, and maintain regular cleanup routines.

For detailed setup instructions and best practices, dive into the full article.

DEV Community