Summonair

Posted on Sep 12

Resolving Storage Space Issues on AWS RDS Postgres

#aws #rds #postgres #devops

As a devops engineer, maintaining optimal database performance is crucial. Recently, we encountered a persistent issue with our Postgres RDS instance: the free storage space was decreasing rapidly, and finding the root cause wasn't straightforward. This post outlines the investigation process, the routes we explored, and how we ultimately resolved the issue.

1. Temporary Files

One of the first things we checked was the use of temporary files. Typically, large temporary files can consume significant storage, especially when work_mem is set too low or when long-running queries abuse temp space. We ran the following query to check temporary files usage:

SELECT datname, temp_files AS "Temporary files", temp_bytes AS "Size of temporary files" 
FROM pg_stat_database;

However, in our case, this was ruled out early because our configuration uses a storage-optimized image. Temporary files were not written to the DB's EBS storage but to the local storage instead, so they were not contributing to the space issue on the database storage.

2. Dead Tuples

Next, we considered dead tuples, which can accumulate over time and bloat the database size. Dead tuples occur when rows are marked for deletion or updated, but the space they occupy isn’t immediately reclaimed. This can lead to bloat, where disk space is consumed by data that’s no longer needed but remains invisible to transactions. We checked the tables with the most dead tuples using:

SELECT relname, n_dead_tup 
FROM pg_stat_all_tables 
ORDER BY n_dead_tup DESC 
LIMIT 10;

Although we found a few million dead tuples, running a vacuum didn’t recover as much space as anticipated. This indicated that while dead tuples were present, they weren’t the main factor behind the excessive storage consumption

3. Orphaned Files

We also checked for orphaned files, which can occur if files remain in the database directory without corresponding objects pointing to them. This situation might happen if the instance runs out of storage or the engine crashes during operations like ALTER TABLE, VACUUM FULL, or CLUSTER. To check for orphaned files, we compared the database size occupied by files against the real size retrieved by summing the objects:

-- Size of the database occupied by files
SELECT pg_size_pretty(pg_database_size('DATABASE_NAME'));

-- Size of database retrieved by summing the objects (real size)
SELECT pg_size_pretty(SUM(pg_relation_size(oid))) 
FROM pg_class;

If a significant difference exists between these sizes, orphaned files might be using up storage space. In our case, the sizes were even, so orphaned files were not the reason.

4. Replication Slots

Replication slots are another common cause of storage bloat due to accumulating WAL (Write Ahead Logs) if they are not consumed properly. However, upon checking, we confirmed that replication was not in use, and no replication slots were active on our instance

5. Log Files

The breakthrough came when we investigated log files. We discovered that our log files were consuming an enormous amount of space:

Log sizes over the past three days included 20GB, 18GB, 10GB, and 8GB files.

Our logging configuration was set to log queries that ran for a minimum of 1 second, including their bind parameters. This resulted in very large log entries, especially when complex queries with extensive parameters were executed frequently.

The Solution

To resolve the storage issue, we made the following adjustments:

Increased Minimum Run Time for Logging (log_min_duration_statement): We changed the minimum run time for logging queries from 1 second to 5 seconds. This reduced the number of logged queries significantly, focusing on longer-running, potentially more impactful queries.
Adjusted Log Retention Policy (rds.log_retention_period): We reduced the retention period of log files from the default 3 days to just 1 day. This change immediately helped in freeing up storage space by reducing the volume of retained log data.

Also worth mentioning is the log_parameter_max_length parameter which can limit the logged bind parameter.

These changes resulted in a significant reduction in the size of the log files, thereby freeing up the storage space and preventing further rapid consumption.

Conclusion

When faced with storage space issues in PostgreSQL RDS, it’s crucial to systematically investigate all possible sources of storage consumption. In our case, the logs were the unexpectedly large, but a strategic adjustment to logging parameters and retention policies helped us reclaim valuable storage space.

Top comments (1)

Boopathi • Sep 14

Thanks for sharing

DEV Community

Resolving Storage Space Issues on AWS RDS Postgres

1. Temporary Files

2. Dead Tuples

3. Orphaned Files

4. Replication Slots

5. Log Files

The Solution

Conclusion

Top comments (1)

Read next

Top 10 announcements from AWS re:Invent 2024 you need to know

My First Full-Stack Deployment with Docker and NGINX as Load Balancer

Building an Event-Driven Architecture for Content Embedding Generation with AWS Bedrock, DynamoDb, and AWS Batch

Referential integrity In The Absence Of Foreign Key