Lester Sim

Posted on Aug 13, 2023

AWS Certified Database - Specialty Notes

#aws #database #cloud #certification

AWS Database

Disclaimer: The opinions expressed here are my own and I'm not writing on behalf of AWS or Amazon.

Here are some quick notes I've gathered to prepare for the certification:

Amazon RDS

Benefits of Managed Database

Automated provisioning
Continuous backups and ability to restore to specific timestamp
Monitoring dashboards
Read replicas for improved read performance
Multi AZ setup for Disaster Recovery
Maintenance windows for OS patching and version upgrades
Scaling capability (vertical and horizontal)
Storage backed by EBS (gp2 or io1). Can be set to auto-scaling.

Pricing Model

Pay as you go pricing model
Instance types
- On-demand (Pay for compute capacity per hour)
- Reserved (deeply discounted, 1-year or 3-year term contract)
Storage (GB/month) / Backups / Snapshot Export to S3
I/O (per million requests)
Data transfer

RDS Instance Types

Standard
Memory-optimized (memory-intensive, high performance workloads)

RDS Storage Types

General Purpose Storage: General Purpose SSD volumes offer cost-effective storage that is ideal for a broad range of workloads running on medium-sized DB instances. General Purpose storage is best suited for development and testing environments.
Provisioned IOPS: Provisioned IOPS storage is designed to meet the needs of I/O-intensive workloads, particularly database workloads, that require low I/O latency and consistent I/O throughput. Provisioned IOPS storage is best suited for production environments.
RDS Storage Auto Scaling: Storage is scaled up automatically when the utilization nears the provisioned capacity. Triggers:
- Free available space is less than 10% of the allocated storage.
- The low-storage condition lasts at least five minutes.
- At least 6 hours have passed since the last storage modification.
The additional storage is in increments of whichever of the following is greater:
- 5 GiB
- 10% of currently allocated storage
- Storage growth prediction for 7 hours based on the FreeStorageSpace metrics change in the past hour.

RDS Parameter Groups

Configuration file to implement on database instance
Default parameter group cannot be edited. To make config changes, you must create a new parameter group
Changes to dynamic parameters always get applied immediately (irrespective of Apply Immediately setting)
Changes to static parameters require a manual reboot

RDS Option Groups

For configuration of optional features offered by DB engines (not covered by parameter groups)

RDS Security

Traditional Username and Password can be used to log in to the database
IAM-based authentication can be used to login into RDS MySQL & PostgreSQL.
You cannot SSH into an RDS DB instance.
You can map multiple IAM users or roles to the same database user account
Rotating RDS DB Credentials: Use AWS Secrets Manager. Supports automatic rotation of secrets. Secrets Manager provides a Lambda rotation function and populates it automatically with the ARN in the secret.

RDS Backups

RDS supports automatic backups. Capture transaction logs in real time
Enabled by default with a 7-days retention period (0-35 days retention, 0=disable automatic backups) via Console. The default backup retention period is one day if you create the DB instance using the Amazon RDS API or the AWS CLI.
Disabling automatic backups for a DB instance deletes all existing automated backups for the instance
Automated backups are deleted when the DB instance is deleted. Only manually created DB Snapshots are retained after the DB Instance is deleted.
Manual snapshot limits (of 100 per region) does not apply to automated backups.
The first automatic backup is a full backup. Subsequent backups are incremental.
Backup Data is stored in a S3 bucket (owned and managed by RDS service, you won’t see them in your S3 console)
You can share manual DB snapshots with up to 20 AWS accounts. Automated Amazon RDS snapshots cannot be shared directly with other AWS accounts. Can share DB snapshots across different regions.

Multi-AZ Deployments and Read Replicas

Configuring and managing a Multi-AZ deployment: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZ.html
Working with Read Replicas: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html
To create read replicas, you need to enable automatic backups on source RDS DB instance.
Multi-AZ follows synchronous replication and spans at least two Availability Zones within a single region. Read Replicas follow asynchronous replication and can be within an Availability Zone, Cross-AZ, or Cross-Region.
Amazon RDS for MySQL, MariaDB and PostgreSQL allow you to add up to 15 read replicas to each DB Instance. Amazon RDS for Oracle and SQL Server allow you to add up to 5 read replicas to each DB Instance.
For managing multiple read replicas, you may add each read replica endpoint to a Route 53 record set and configure weighted routing to distribute traffic across different read replicas.

RDS Monitoring

In RDS Console/CloudWatch: CPU, Memory, DatabaseConnections, IOPS, disk space consumption, etc
RDS Recommendations: Automated suggestions for DB instances, read replicas, etc
RDS Enhanced Monitoring: Get real-time OS level metrics (CPU, Memory). Agent is automatically installed on DB server to collect metrics. Metrics will be pushed to CloudWatch as well.
RDS Performance Insights: Dashboard for performance tuning and analysis eg. which SQL query has the highest load. Automatically publishes metrics to CloudWatch.

Amazon Aurora

Differences with RDS

Multi-AZ deployments for RDS MySQL follow synchronous replication whereas Multi-AZ deployments for Aurora MySQL follow asynchronous replication
Read Replicas can be manually promoted to a standalone database instance for RDS MySQL whereas Read Replicas for Aurora MySQL can be promoted to the primary instance
The primary and standby DB instances are upgraded at the same time for RDS MySQL Multi-AZ. All instances are upgraded at the same time for Aurora MySQL

Aurora Backtracking

Restoring a DB cluster to a point in time launches a new DB cluster and restores it from backup data or a DB cluster snapshot, which can take hours. Backtracking a DB cluster doesn't require a new DB cluster and rewinds the DB cluster in minutes.
The limit for a backtrack window is 72 hours.
Backtracking affects the entire DB cluster. For example, you can't selectively backtrack a single table or a single data update.

Aurora Cloning

Aurora cloning works at the storage layer of an Aurora DB cluster. Uses a copy-on-write protocol.
Aurora cloning is especially useful for quickly setting up test environments using your production data, without risking data corruption.
Database cloning uses a copy-on-write protocol, in which data is copied only at the time the data changes, either on the source database or the clone database. Cloning is much faster than a manual snapshot of the DB cluster.

Failover

Read replica automatically promoted, failover automatically
Master instance that failed will become read replica when it comes back online

Aurora Global Database

1 Primary Region (R/W), up to 5 secondary regions (Read only). Underlying cluster storage volume replicated to another region.
If 1 region goes down, can promote another region to be the primary region.

Aurora Serverless

Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application's needs. You can run your database in the cloud without managing any database instances.

Amazon DynamoDB

Fully managed, serverless, Key-Value database.

Consistency

Eventually consistent is the default read consistent model for all read operations. When issuing eventually consistent reads to a DynamoDB table or an index, the responses may not reflect the results of a recently completed write operation. If you repeat your read request after a short time, the response should eventually return the more recent item. Eventually consistent reads are supported on tables, local secondary indexes, and global secondary indexes.
Read operations such as GetItem, Query, and Scan provide an optional ConsistentRead parameter. If you set ConsistentRead to true, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. Strongly consistent reads are only supported on tables and local secondary indexes.

Scan vs Query Operation

Scan

The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index. To have DynamoDB return fewer items, you can provide a FilterExpression operation.
Eventual/Strong Consistency
Prefer Query over Scan when possible. Query
Find items based on primary key values (partition key/sort key). Return all items with that partition key.
Eventual/Strong Consistency
Faster than Scan because it only scans through that parition specified

Primary Key

Simple Primary Key: Just 1 partition key
Composite Primary Key: Comprise of 1 partition key and 1 sort key

Partition Key: Used for partition selection via DynamoDB internal hash function
Sort Key: Range select or to order results. Sort keys may not be used on their own.

Local Secondary Indexes

Up to 5 LSIs
Has same partition key as the primary index of the table but has different sort key than the primary index of the table. A local secondary index is "local" in the sense that every partition of a local secondary index is scoped to a base table partition that has the same partition key value.
Can only be created at the time of creating the table and cannot be deleted later
Support eventual / strong / transactional consistency
Use Case:
- When application needs same partition key as the table
- When application needs strongly consistent index reads

Global Secondary Indexes

Up to 20 GSIs
Can have same or different partition key than the table’s primary index
Can have same or different sort key than the table’s primary index. Optional to have sort key.
A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.
Can have different schema from base table. Cannot fetch attributes from the base table other than the base table’s primary key attributes.
Supports only eventual consistency
Can be created or deleted any time
Has its own provisioned throughput. If the writes are throttled on the GSI, then the main table will be throttled too.
Use Case:
- When application needs different or same partition key as the table
- When application needs finer throughput control

DynamoDB Accelerator (DAX)

Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for Amazon DynamoDB that delivers up to a 10 times performance improvement.

DynamoDB response times: Single-digit milliseconds
DynamoDB with DAX response times: Microseconds
Reduce read load on DynamoDB
Supports only eventual consistency
Redirect your DynamoDB API request to the DAX endpoint instead of DynamoDB endpoint

This is only a brief summary of the core topics I found to be important and not exhaustive. There are more database-related services covered in the certification. Please refer to https://aws.amazon.com/certification/certified-database-specialty/ for the full set of topics to prepare.

DEV Community