• OLTP: Online Tx Processing, e.g. return a row with id is equal to something.
• OLAP: Online Analytics Processing, e.g. Sum of radios sold in the given region.
• Data ware housing is meant for analytics.
RDS (Relational Database Services) (OLTP)
• Following DBs are available in AWS:
o SQL Server
o Oracle
o MySQL, an open source
o Postgres
o Aurora
o MariaDB
• RDS instance end point is always DNS name not the IP address.
• Whenever there is an integration between 2 services, it involves the SGs integration as well, that means, the inbound rule of second SG should have first SG.
• Backups:
o Types
▪ Automated
• It is highly discouraged to disable the automated backups in Amazon RDS because it disables point-in-time recovery. Disabling automatic backups for a DB instance deletes all existing automated backups for the instance. If you disable and then re-enable automated backups, you are only able to restore starting from the time you re-enabled automated backups.
• It gets deleted after original DB is deleted.
• It allows you to recover database to any point in time with in “retention period”.
• The retention period can be between 1 and 35 days.
• It will take a full daily snapshot and also store the Tx logs throughout the day.
• While recovering, it chooses the most recent daily back up and then apply Tx logs relevant to that day.
• It's enabled by default.
• It's stored in S3.
• You get free storage space in S3 equal to that of the size of DB.
▪ Snapshot
• It's done manually.
• It’s stored even after original DB is deleted.
o The restored DB is always a new DB instance with the new DNS end point.
o Backup time also can be defined.
o During the back, the I/O operation will be slow.
• Encryption
o Encryption at rest is supported for MySQL, Oracle, SQL Server, PostgreSQL, MariaDB, Aurora
o Encryption is done using AWS Key Management Service.
o Once RDS instance is encrypted then it’s underlying data, it’s automated backups, snapshots and read replicas are also encrypted.
o Encrypting an existing RDS instance is not supported, to encrypt an existing RDS instance:
▪ Take a snapshot of DB.
▪ Make a copy of the snapshot.
▪ Encrypt this copy.
• Multi AZ
o It provides the stand by DB, where any write operations to primary DB gets reflected immediately to stand by DB.
o It's for Data recovery.
o It's available in following DB:
▪ SQL Server
▪ Oracle
▪ MySQL
▪ Postgres
▪ Aurora: by default, it’s available
▪ MariaDB
o Multi-AZ deployments for the MySQL, MariaDB, Oracle, and PostgreSQL engines utilize synchronous physical replication to keep data on the standby up-to-date with the primary.
o Multi-AZ deployments for the SQL Server engine use synchronous logical replication to achieve the same result, employing SQL Server-native Mirroring technology. Both approaches safeguard your data in the event of a DB Instance failure or loss of an Availability Zone.
o Failover is automatically handled by Amazon Aurora so that your applications can resume database operations as quickly as possible without manual administrative intervention.
o If you have an Amazon Aurora Replica in the same or a different Availability Zone, when failing over, Amazon Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which in turn is promoted to become the new primary. Start-to-finish, failover typically completes within 30 seconds.
o If you do not have an Amazon Aurora Replica (i.e. single instance), Aurora will first attempt to create a new DB Instance in the same Availability Zone as the original instance. If unable to do so, Aurora will attempt to create a new DB Instance in a different Availability Zone. From start to finish, failover typically completes in under 15 minutes.
• Read Replicas:
o Read replicas is needed because the internet world is more of read operation and less of write operation.
o Whenever there is a write operation, it pushes the data to all read replicas.
o Read replicas can be present in different AZ or even different region too.
o It provides up to 5 read replicas per instance.
o Coping data is asynchronous operation.
o Used for scaling not DR as read replicas is read only.
o Each read replica has its own DNS point.
o Must have automatic backups on in order to deploy a read replica.
o It can be promoted to its own DB, but it breaks the replication.
o Following DB supports RR:
▪ MySQL
▪ PostgreSQL
▪ Amazon Aurora
▪ MariaDB
• When replicating data from your primary RDS instance to your secondary RDS instance, there is no charge.
• I/O operations is suspended while you take a database snapshot
• In RDS when using multiple availability zones, you cannot use the secondary database as an independent read node.
• By default, the maximum provisioned IOPS capacity on an Oracle and MySQL RDS instance (using provisioned IOPS) is 30,000 IOPS: False
• In RDS when using multiple availability zones, you cannot use the secondary database as an independent read node?
Dynamo DB
• NoSQL
• It's both document and key-value data model.
• It's stored on SSD
• It's spread across 3 different Data centers.
• Consistency:
o Options:
▪ Eventual Consistent Read (Default)
▪ Strongly consistent Read
o It comes with “Eventual Consistent Read” options where consistency across all copies of data is usually within a second, so better wait for some time (at least 1 second), before you make read operation after you updated the data.
• DynamoDB is easier to scale without any downtime.
• DynamoDB supports key-value and document data structures. A key-value store is a database service that provides support for storing, querying, and updating collections of objects that are identified using a key and values that contain the actual content being stored. Meanwhile, a document data store provides support for storing, querying, and updating items in a document format such as JSON, XML, and HTML.
• The DynamoDB Time-to-Live (TTL) mechanism enables you to manage web sessions of your application easily. It lets you set a specific timestamp to delete expired items from your tables. Once the timestamp expires, the corresponding item is marked as expired and is subsequently deleted from the table. By using this functionality, you do not have to track expired data and delete it manually. TTL can help you reduce storage usage and reduce the cost of storing data that is no longer relevant.
• Amazon DynamoDB stores structured data indexed by primary key, and allows low latency read and write access to items ranging from 1 byte up to 400KB. Amazon S3 stores unstructured blobs and is suited for storing large objects up to 5 TB. In order to optimize your costs across AWS services, large objects or infrequently accessed data sets should be stored in Amazon S3, while smaller data elements or file pointers (possibly to Amazon S3 objects) are best saved in Amazon DynamoDB.
Elastic Cache
• It's a web service, which caches the most frequent query result.
• You can manage HTTP session data from the web servers using an In-Memory Key/Value store such as Redis and Memcached.
• It supports 2 open source in-memory cache options:
o Memcached: Memcached is an in-memory key-value store for small arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
o Redis: Redis is an open source, in-memory data structure store used as a database, cache, and message broker.
Red Shift (OLAP)
• Used by the organization to analyze the data, which are huge.
• It's used for OLAP Tx.
• Configuration
o Single Node (160 GB)
o Multi Node:
▪ Leader Node: manages client connection
▪ Compute Node: store data and perform query. Up to 128 nodes.
• Unlike any other normal DB,
o this is column-based storage, where queries mostly involve performing aggregates over the huge set of data
o Column based data is better compressed than row-based data as column-based data is of same type while row-based data is across all columns and these columns are of different type.
• It doesn’t need any indexes and materialized views so occupy less space.
• With the multi nodes architecture, it provides Massively Parallel Processing (MPP)
• You will be charged for compute node not for leader node.
• Encryption
o Encryption at rest using AES-256 encryption.
o By default, Red shift takes care of key management, but still you can manage it using:
▪ AWS Key Management Service.
▪ HSM
o Encrypted in transit using SSL.
• This is not production DB, it's for analytics hence doesn’t make sense for Multi AZ, hence it’s single AZ.
• Can restore snapshot to new AZ in case of outage.
Aurora
• MySQL compatible DB
• It has downtime during scale up time.
• 2 copies of data in each AZ, with minimum of 3 AZ, so 6 copies of your data. That means it can survive up to 3 copy loss for read availability and 2 copy loss for write availability. Explanation could be because at least 1 copy for read should be present in each AZ, hence it can survive loss of 3 copy for read operation. Out of 3 AZ, 1 will be primary DB but not sure which copy so that particular AZ shouldn’t be down and hence it can bear loss of 2 copies for write operation.
• It's self-healing. Data blocks and disks are scanned for error and repaired automatically.
• 2 types of replicas:
o Aurora replicas: total 15
o MySQL replicas: 5
• Ony difference between Aurora replicas and MySQL replicas is Aurora replicas is fail over.
Top comments (0)