DEV Community

Hiren Dhaduk
Hiren Dhaduk

Posted on

Top Things to Monitor in Amazon RDS

Amazon RDS has made it easier to scale into thousands of requests per second and terabytes of data under management. Thanks to Amazon RDS, websites and applications scale easily into thousands of requests per second and terabytes of data. However, monitoring Amazon RDS may require observability strategy changes if you switch from a classic on-premise MySQL/PostgreSQL solution.

Monitoring is integral to maintaining the availability, reliability, and performance of Amazon RDS and database. It is highly recommended to collect monitoring data from your database on AWS to debug multi-point failures. In this post, we will discuss the important things to monitor in your Amazon RDS.

Top metrics for Amazon RDS monitoring

An application must identify a list of critical metrics and automate task monitoring when these metrics breach an acceptable threshold. So here are the key metrics for Amazon RDS monitoring that can eventually affect performance, availability, and cost -

1. Performance

Nearly every type of app has performance limitations when they scale up further. Therefore it is imperative to determine the limit you need to monitor and optimize. Such performance improvements mainly depend on a robust monitoring system to identify and eliminate critical bottlenecks.

Consider the below performance metrics while monitoring AWS RDS -

RDS Metrics

You need to detect whether your current database is heavier on writes and reads. If your database is bulky on writes, you can benefit from table locking, removing slow indexes, and foreign key constraints. On the other hand, if your database is heavy on reads, you may get advantages like reading replicas, upstream caches, or materialized views.

Slow Query Log

The best way to track which statements take more time in production is to look at your slow query log. You must turn it on explicitly, as this log is not kept by default and is kept in a separate database table. Once you detect time-consuming statements, you can get the query plan which the database uses to complete the information and try to optimize it.

Instance Metrics

You might want to check your CPU credit balance if you hit a CPU limit on a shared instance. For example, you might hit the ceiling on I/O operations per second (IOPS) or volume throughput. As each instance type has a different amount of network throughput, it's easy to overlook resource dependencies.

2. Availability

You need to get notified when there is a database outage, congestion, or a downward trend in availability, as these may impair the application's performance and eventually result in data loss.

Maintenance Windows

Amazon RDS databases come with weekly security and maintenance updates for patches. Most maintenance can be completed without affecting performance, while others may need a multiple-availability-zone failover. Still, there are high chances that an upgrade could cause a performance issue or an unexpected problem with the application.

Human Errors

Sometimes due to an accidental change made by some team member, your database might go offline. Using API calls, you can look at the CloudTrail log to audit those changes. Not only this, but a misconfigured schema or an out-of-sync cluster can also make the database disappear. So to detect the right issue, you need to watch out for errors in making schema or cluster changes.

Resource Usage and System Errors

Databases might become unavailable due to hitting resource usage limits, table size limits in MySQL, max allowed database connections per instance, and capacity constraints during periods of peak demand. Therefore, checking your database error log in the RDS console is advisable. It will help you detect errors that are internal to the database.

3. Cost

Amazon RDS is quite elastic when it comes to upscaling. Still, managing your costs requires a smarter approach in the long term. Here an effective monitoring system is a key to assisting your team in managing exorbitant costs and addressing critical issues.

Instance Type

You can choose lower-cost Amazon instances and automatically scale as and when required. Also, if you have a steady baseline capacity need, you can purchase reserved instances at a lower cost. Furthermore, you can also take advantage of off-hours pricing to do batch jobs on reading replicas.

Auto-Scaling

You can configure scripts to scale your infrastructure and match demand automatically. Also, you can run the script, add read replicas, and update your proxies with the new node if you hit a CPU threshold monitored in, let's say, CloudWatch. You can even use Amazon Aurora to add read replicas faster.

Instance Count

You can horizontally scale systems by adding read replicas. However, large clusters require more operational effort to maintain as the systems require logic to read replicas that manage the load.

Conclusion

Here we have discussed how easier it is to monitor Amazon RDS with a few accurate vital metrics. Such key metrics enable you to visualize the entire picture more clearly while troubleshooting issues in Amazon RDS.

Top comments (0)