AWS Database
Disclaimer: The opinions expressed here are my own and I'm not writing on behalf of AWS or Amazon.
Here are some quick notes I've gathered to prepare for the certification:
Amazon RDS
Benefits of Managed Database
- Automated provisioning
- Continuous backups and ability to restore to specific timestamp
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for Disaster Recovery
- Maintenance windows for OS patching and version upgrades
- Scaling capability (vertical and horizontal)
- Storage backed by EBS (gp2 or io1). Can be set to auto-scaling.
Pricing Model
- Pay as you go pricing model
- Instance types
- On-demand (Pay for compute capacity per hour)
- Reserved (deeply discounted, 1-year or 3-year term contract)
- Storage (GB/month) / Backups / Snapshot Export to S3
- I/O (per million requests)
- Data transfer
RDS Instance Types
- Standard
- Memory-optimized (memory-intensive, high performance workloads)
RDS Storage Types
- General Purpose Storage: General Purpose SSD volumes offer cost-effective storage that is ideal for a broad range of workloads running on medium-sized DB instances. General Purpose storage is best suited for development and testing environments.
- Provisioned IOPS: Provisioned IOPS storage is designed to meet the needs of I/O-intensive workloads, particularly database workloads, that require low I/O latency and consistent I/O throughput. Provisioned IOPS storage is best suited for production environments.
- RDS Storage Auto Scaling: Storage is scaled up automatically when the utilization nears the provisioned capacity. Triggers:
- Free available space is less than 10% of the allocated storage.
- The low-storage condition lasts at least five minutes.
- At least 6 hours have passed since the last storage modification.
- The additional storage is in increments of whichever of the following is greater:
- 5 GiB
- 10% of currently allocated storage
- Storage growth prediction for 7 hours based on the FreeStorageSpace metrics change in the past hour.
RDS Parameter Groups
- Configuration file to implement on database instance
- Default parameter group cannot be edited. To make config changes, you must create a new parameter group
- Changes to dynamic parameters always get applied immediately (irrespective of Apply Immediately setting)
- Changes to static parameters require a manual reboot
RDS Option Groups
- For configuration of optional features offered by DB engines (not covered by parameter groups)
RDS Security
- Traditional Username and Password can be used to log in to the database
- IAM-based authentication can be used to login into RDS MySQL & PostgreSQL.
- You cannot SSH into an RDS DB instance.
- You can map multiple IAM users or roles to the same database user account
- Rotating RDS DB Credentials: Use AWS Secrets Manager. Supports automatic rotation of secrets. Secrets Manager provides a Lambda rotation function and populates it automatically with the ARN in the secret.
RDS Backups
- RDS supports automatic backups. Capture transaction logs in real time
- Enabled by default with a 7-days retention period (0-35 days retention, 0=disable automatic backups) via Console. The default backup retention period is one day if you create the DB instance using the Amazon RDS API or the AWS CLI.
- Disabling automatic backups for a DB instance deletes all existing automated backups for the instance
- Automated backups are deleted when the DB instance is deleted. Only manually created DB Snapshots are retained after the DB Instance is deleted.
- Manual snapshot limits (of 100 per region) does not apply to automated backups.
- The first automatic backup is a full backup. Subsequent backups are incremental.
- Backup Data is stored in a S3 bucket (owned and managed by RDS service, you won’t see them in your S3 console)
- You can share manual DB snapshots with up to 20 AWS accounts. Automated Amazon RDS snapshots cannot be shared directly with other AWS accounts. Can share DB snapshots across different regions.
Multi-AZ Deployments and Read Replicas
- Configuring and managing a Multi-AZ deployment: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZ.html
- Working with Read Replicas: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html
- To create read replicas, you need to enable automatic backups on source RDS DB instance.
- Multi-AZ follows synchronous replication and spans at least two Availability Zones within a single region. Read Replicas follow asynchronous replication and can be within an Availability Zone, Cross-AZ, or Cross-Region.
- Amazon RDS for MySQL, MariaDB and PostgreSQL allow you to add up to 15 read replicas to each DB Instance. Amazon RDS for Oracle and SQL Server allow you to add up to 5 read replicas to each DB Instance.
- For managing multiple read replicas, you may add each read replica endpoint to a Route 53 record set and configure weighted routing to distribute traffic across different read replicas.
RDS Monitoring
- In RDS Console/CloudWatch: CPU, Memory, DatabaseConnections, IOPS, disk space consumption, etc
- RDS Recommendations: Automated suggestions for DB instances, read replicas, etc
- RDS Enhanced Monitoring: Get real-time OS level metrics (CPU, Memory). Agent is automatically installed on DB server to collect metrics. Metrics will be pushed to CloudWatch as well.
- RDS Performance Insights: Dashboard for performance tuning and analysis eg. which SQL query has the highest load. Automatically publishes metrics to CloudWatch.
Amazon Aurora
Differences with RDS
- Multi-AZ deployments for RDS MySQL follow synchronous replication whereas Multi-AZ deployments for Aurora MySQL follow asynchronous replication
- Read Replicas can be manually promoted to a standalone database instance for RDS MySQL whereas Read Replicas for Aurora MySQL can be promoted to the primary instance
- The primary and standby DB instances are upgraded at the same time for RDS MySQL Multi-AZ. All instances are upgraded at the same time for Aurora MySQL
Aurora Backtracking
- Restoring a DB cluster to a point in time launches a new DB cluster and restores it from backup data or a DB cluster snapshot, which can take hours. Backtracking a DB cluster doesn't require a new DB cluster and rewinds the DB cluster in minutes.
- The limit for a backtrack window is 72 hours.
- Backtracking affects the entire DB cluster. For example, you can't selectively backtrack a single table or a single data update.
Aurora Cloning
- Aurora cloning works at the storage layer of an Aurora DB cluster. Uses a copy-on-write protocol.
- Aurora cloning is especially useful for quickly setting up test environments using your production data, without risking data corruption.
- Database cloning uses a copy-on-write protocol, in which data is copied only at the time the data changes, either on the source database or the clone database. Cloning is much faster than a manual snapshot of the DB cluster.
Failover
- Read replica automatically promoted, failover automatically
- Master instance that failed will become read replica when it comes back online
Aurora Global Database
- 1 Primary Region (R/W), up to 5 secondary regions (Read only). Underlying cluster storage volume replicated to another region.
- If 1 region goes down, can promote another region to be the primary region.
Aurora Serverless
- Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application's needs. You can run your database in the cloud without managing any database instances.
Amazon DynamoDB
Fully managed, serverless, Key-Value database.
Consistency
- Eventually consistent is the default read consistent model for all read operations. When issuing eventually consistent reads to a DynamoDB table or an index, the responses may not reflect the results of a recently completed write operation. If you repeat your read request after a short time, the response should eventually return the more recent item. Eventually consistent reads are supported on tables, local secondary indexes, and global secondary indexes.
- Read operations such as GetItem, Query, and Scan provide an optional ConsistentRead parameter. If you set ConsistentRead to true, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. Strongly consistent reads are only supported on tables and local secondary indexes.
Scan vs Query Operation
Scan
- The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index. To have DynamoDB return fewer items, you can provide a FilterExpression operation.
- Eventual/Strong Consistency
- Prefer Query over Scan when possible. Query
- Find items based on primary key values (partition key/sort key). Return all items with that partition key.
- Eventual/Strong Consistency
- Faster than Scan because it only scans through that parition specified
Primary Key
Simple Primary Key: Just 1 partition key
Composite Primary Key: Comprise of 1 partition key and 1 sort key
Partition Key: Used for partition selection via DynamoDB internal hash function
Sort Key: Range select or to order results. Sort keys may not be used on their own.
Local Secondary Indexes
- Up to 5 LSIs
- Has same partition key as the primary index of the table but has different sort key than the primary index of the table. A local secondary index is "local" in the sense that every partition of a local secondary index is scoped to a base table partition that has the same partition key value.
- Can only be created at the time of creating the table and cannot be deleted later
- Support eventual / strong / transactional consistency
- Use Case:
- When application needs same partition key as the table
- When application needs strongly consistent index reads
Global Secondary Indexes
- Up to 20 GSIs
- Can have same or different partition key than the table’s primary index
- Can have same or different sort key than the table’s primary index. Optional to have sort key.
- A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.
- Can have different schema from base table. Cannot fetch attributes from the base table other than the base table’s primary key attributes.
- Supports only eventual consistency
- Can be created or deleted any time
- Has its own provisioned throughput. If the writes are throttled on the GSI, then the main table will be throttled too.
- Use Case:
- When application needs different or same partition key as the table
- When application needs finer throughput control
DynamoDB Accelerator (DAX)
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for Amazon DynamoDB that delivers up to a 10 times performance improvement.
- DynamoDB response times: Single-digit milliseconds
- DynamoDB with DAX response times: Microseconds
- Reduce read load on DynamoDB
- Supports only eventual consistency
- Redirect your DynamoDB API request to the DAX endpoint instead of DynamoDB endpoint
This is only a brief summary of the core topics I found to be important and not exhaustive. There are more database-related services covered in the certification. Please refer to https://aws.amazon.com/certification/certified-database-specialty/ for the full set of topics to prepare.
Top comments (0)