Aurora DSQL: How it Compares to YugabyteDB

#yugabytedb #postgres #distributed #database

This was originally posted on the Yugabyte blog and details the Aurora DSQL Disaggregated Architecture

Seven years after the launch of the first cloud-native distributed SQL database, Google Spanner, AWS has entered the distributed SQL arena. Amazon announced Aurora DSQL at re:Invent 2024. This new distributed SQL database validates the growing importance of distributed SQL and introduces unique architectural features tied to the AWS infrastructure.

YugabyteDB was the first distributed SQL database to use PostgreSQL code for SQL processing, rather than emulating it in another language. Amazon Aurora DSQL adopts a similar approach, leveraging PostgreSQL code to avoid limitations related to data type arithmetic, library inconsistencies, and the complexities of rewriting an engine that benefits from decades of community development.

This blog examines Aurora DSQL’s PostgreSQL compatibility, architectural distinctions, concurrency control, multi-region capabilities, and positioning relative to YugabyteDB in the evolving database landscape.

Isn’t Amazon Aurora DSQL Just Another RDS Aurora?

Amazon Aurora DSQL is a serverless, distributed SQL database designed for transactional workloads. It supports ACID transactions, scales automatically based on workload demands, and provides high availability, which AWS advertises as 99.99% within a single region and 99.999% across multiple regions. Aurora DSQL’s architecture disaggregates compute, storage, and I/O, scaling each independently and enabling automated failure recovery without traditional failover processes.

While it shares the “Aurora” name, Aurora DSQL is architecturally distinct from the existing Aurora services such as Amazon Aurora PostgreSQL-Compatible (abbreviated APG) and its Aurora Serverless and Aurora Limitless options.

Aurora employs a single-writer architecture with distributed storage (internal name Grover) across multiple availability zones. Aurora Serverless automates infrastructure scaling (internal name Caspian) for applications with variable traffic patterns. Aurora Limitless introduces sharding for high scalability (similar to Postgres-XL), which is optimal for single-shard transactions. In contrast, Aurora DSQL uses a distributed, active-active design with automatic scaling and redundancy based on other components (Firecracker micro-VM and the internal log many AWS services use).

Sharing the same umbrella name can confuse application developers who use RDS PostgreSQL or Aurora APG while trying to expand their solutions. Due to its unique transaction behavior and fewer features, migrating an application from Aurora to Aurora DSQL requires more effort than moving to a PostgreSQL-compatible database like YugabyteDB. There is currently no migration tool available for Aurora DSQL, such as AWS DMS or YugabyteDB Voyager.

What is Aurora DSQL Disaggregated Architecture?

Amazon Aurora DSQL is built on a disaggregated distributed architecture. Key components include:

Frontend: This handles client connections using the PostgreSQL protocol. It includes a transaction and session router that directs connections to the appropriate compute layer, similar to PgBouncer but integrated into the database. Connection management has always been problematic with the PostgreSQL process-per-connection model, and a scalable application requires different connection handling. YugabyteDB also has a transparent connection manager that associates logical connections, from the application to physical connections from a database-resident connection pool. Unlike Aurora DSQL, all features are available in the YugabyteDB connection manager. These include temporary tables, sequences, and prepared statements for storing the session state in shared memory, or using stickiness when sharing is impossible.
Query Processor: This stateless compute layer runs the PostgreSQL engine. It processes the SQL statements by parsing, query rewriting, query planning, and execution. Aurora DSQL doesn’t support session-level objects, like temporary tables or sequences, and each transaction creates a new Firecracker micro-VM from a snapshot. Separating the stateless SQL processing layer that can scale independently resembles YugabyteDB’s PostgreSQL-compatible YSQL query layer. However, YugabyteDB maintains the session state to provide all PostgreSQL features.
Transaction Processor: Unlike PostgreSQL, Aurora DSQL maximizes scalability by avoiding access to the database state during transaction operations. Write intents are logged, and an Adjudicator validates compatibility between the transaction’s read and write states. Conflicting transactions are rejected. To limit its complexity, transactions are limited to 10,000 write intents, 10,000 rows without secondary indexes, or less if there are index entries to update. YugabyteDB, by contrast, checks for conflicts during each batch of write operations to maintain PostgreSQL’s isolation levels, and has no limit in transaction intents, allowing for bulk operations and small transactions.
Journal: Aurora DSQL implements a distributed log service to ensure durability. Like all SQL databases, write-ahead logging (WAL) persists transaction intents before they are asynchronously applied to storage. Aurora DSQL’s journal leverages AWS’s mature infrastructure, which is already used in services like S3 and DynamoDB. YugabyteDB achieves durability through Raft-based replication of transaction intents, which adds more write latency but enables ongoing transactions to persist through node failures.
Storage: Data is sharded using a shard map, with changes applied from the journal to the respective storage engines. Aurora DSQL’s storage is distinct from Aurora’s multi-AZ storage. It can store cold data in S3. YugabyteDB distributes transaction intents across tablets, which can asynchronously apply the committed changes to the regular datastore. Both intents, provisional records, and regular committed changes are stored in the LSM Tree by YugabyteDB. The LSM trees are based on RocksDB, but include MVCC Garbage Collection in their compaction to avoid all vacuum issues of PostgreSQL.
Time Synchronization: Bounded clock skew is vital for strong consistency and low latency in distributed databases like Aurora DSQL. The AWS Time Sync Service provides highly accurate time to EC2 instances using atomic clocks, GPS receivers, and a dedicated network, achieving microsecond-level accuracy. Aurora DSQL uses it to read consistent MVCC snapshots. YugabyteDB can be deployed anywhere and doesn’t need such a dedicated service, but it can leverage Time Sync on AWS to increase performance.

How Does Sharding Scale Horizontally in Aurora DSQL?

Aurora DSQL generates a distinct cluster-wide key for each row using a primary key, facilitating automatic storage partitioning via range-based sharding. In contrast to Aurora Limitless, which employs local indexes, Aurora DSQL independently distributes secondary indexes from the primary key to enhance access patterns and enforce unique global constraints.

In Aurora DSQL, transactions are distributed. Client sessions connect to separate query processors that manage SQL processing with snapshot isolation, executing reads and writes in local memory. Most operations are handled locally, reducing cross-region latency. Reads access storage directly, while writes are temporarily stored and later committed to a distributed journal, ensuring consistency through a total ordered stream of transactions. The adjudicator addresses conflicts between concurrent transactions at commit time, permitting consistent writing and aborting conflicting transactions.

This approach is similar to YugabyteDB. However, rather than queuing locally, provisional records are routed to their destination, allowing immediate conflict detection. YugabyteDB adds partial key entries to indicate intents for ranges or tables. The transaction status is also distributed, enabling readers to synchronize with others’ transaction intents and identify which ones have been committed. Additionally, YugabyteDB incorporates hash in addition to range sharding to improve distribution for values susceptible to point queries rather than range queries.

What are the PostgreSQL-Compatibility Limitations of Aurora DSQL?

Aurora DSQL aims to leverage PostgreSQL’s maturity with varying levels of compatibility:

Authentication: Aurora DSQL employs a distinct authentication mechanism compared to PostgreSQL and YugabyteDB. It replaces long-lived, user-defined passwords and LDAP with short-lived AWS-native authentication tokens generated through IAM.
Wire Protocol: Aurora DSQL uses the PostgreSQL wire protocol, enabling compatibility with standard PostgreSQL drivers and tools like PSQL and DBeaver.
SQL Dialect: Aurora DSQL understands PostgreSQL SQL syntax, allowing most PostgreSQL code to run with minimal changes. It eliminates the need for dedicated database dialects in frameworks like Hibernate.
Features: Aurora DSQL supports core relational features like ACID transactions, secondary indexes, joins, DDL, and DML. However, some features (e.g., foreign keys, triggers, PL/pgSQL procedures, temporary tables, and PostGIS extensions) are unsupported. When Aurora DSQL becomes available, some features like PL/pgSQL procedures and triggers will be added. Other features, like sequences or foreign keys, may remain limited, reflecting design trade-offs for scalability.
Runtime: Aurora DSQL employs Optimistic Concurrency Control (OCC), which synchronizes transactions at commit time. This design improves cross-region read latency but offers only snapshot isolation (equivalent to PostgreSQL’s REPEATABLE READ) and requires dedicated application logic. YugabyteDB supports other isolation levels, including Serializable and Read Committed, providing closer PostgreSQL compatibility at the cost of higher latency due to synchronization.

The preview version of Aurora DSQL is missing several key PostgreSQL features, but many can be added easily, except those constrained by concurrency control.

YugabyteDB exemplifies how simple it is to incorporate PostgreSQL features already designed and validated by PostgreSQL. Beginning in 2019 with PostgreSQL 10, YugabyteDB quickly merged from PG11 with some PG12 features. This simplicity is showcased by successfully adding complex elements such as PL/pgSQL, triggers, and extensions like pg_hint_plan, pg_stat_statement, pg_cron, pg_partman, and pg_vector. YugabyteDB has recently merged PostgreSQL 15 and can execute rolling upgrades. Aurora DSQL can do the same for the SQL layer features, because it uses the PostgreSQL code.

How Does Multi-AZ and Multi-Region Resilience Work on Aurora DSQL?

Aurora DSQL is designed for resilience and high availability:

Multi-AZ Architecture: Data is replicated across multiple availability zones within a region. Requests are automatically routed to healthy resources, maintaining availability during failures.
Multi-Region Deployments: Aurora DSQL’s linked clusters provide two regional endpoints as a single logical database. These clusters support strong consistency for concurrent read and write operations, ensuring low-latency read performance across regions. In case of regional failures, Aurora DSQL maintains availability with minimal downtime through a witness region storing transaction logs for recovery.

Aurora DSQL is tightly integrated into AWS infrastructure, and high availability has different trade-offs than a database that must be resilient on commodity hardware or work on any cloud provider. In these trade-offs, Aurora DSQL’s priority is reducing transaction latency to a single synchronization across regions. This sacrifices features requiring more synchronization, like sequences, transactional DDL, or implicit locking.

YugabyteDB’s multi-region capabilities follow a different approach. It uses Raft-based replication to maintain complete copies of data across regions. Its architecture enables flexible cluster configurations, including leader preferences and data placement policies, for reduced latency and regulatory compliance. While YugabyteDB’s design supports greater flexibility and resilience, its synchronization requirements can result in higher latencies than Aurora DSQL’s commit-time coordination.

Aurora DSQL vs. YugabyteDB: Competition or Sharing the Same Vision?

As distributed SQL databases leveraging PostgreSQL code, Aurora DSQL and YugabyteDB share similarities, but diverge in design and use cases:

PostgreSQL Compatibility: YugabyteDB is more compatible with PostgreSQL, supporting features like triggers, PL/pgSQL, foreign keys, sequence, all isolation levels, and explicit locking. Aurora DSQL prioritizes scalability, sacrificing some features for performance.
Deployment: Aurora DSQL is proprietary to Amazon and optimized for serverless, AWS-native applications. YugabyteDB’s open-source database supports multi-cloud, on-premises, and hybrid deployments, providing greater flexibility for diverse environments.
Applications: Aurora DSQL’s optimistic concurrency control minimizes cross-region latency, but requires applications to handle retries. YugabyteDB’s approach offers stricter isolation levels and transactional resilience, but incurs higher synchronization overhead. Read Committed and Wait-on-Conflict locking does not necessitate adding complex retry logic in the application.
Use Cases: Aurora DSQL is ideal for new, serverless applications built on AWS. It is a straightforward evolution for DynamoDB users who want more SQL features. YugabyteDB is a drop-in replacement for PostgreSQL, supporting legacy applications and offering flexibility for multi-cloud and hybrid environments. YugabyteDB is the most resilient open-source database to migrate to when replacing Oracle Database, IBM DB2, or Microsoft SQL Server.

Both are cloud-native SQL databases optimized for transactional workloads that require high availability and elasticity. Both use a fork of PostgreSQL code to provide the best compatibility with this popular, full-featured, open-source database. The two distributed SQL databases add enterprise-level features with encryption and more predictable performance to PostgreSQL applications. They also solve the problem of maintenance windows and downtime with online rolling upgrades.

Conclusion

Aurora DSQL and YugabyteDB represent the future of distributed databases, highlighting the trade-offs between scalability, compatibility, and developer experience. While they may compete in specific scenarios, their strengths validate distributed SQL’s growing importance and transformative potential in modern database architectures.

Aurora DSQL is a serverless disaggregated PostgreSQL fork optimized for transactional workloads. Its strengths include an active-active multi-region architecture for low latency, consistent reads and writes, and optimistic concurrency control for enhanced performance. However, limitations like partial PostgreSQL compatibility, and missing features including foreign keys, sequences, and explicit locking, may require developers to adjust their application designs.

YugabyteDB is a fully compatible PostgreSQL fork deployed on multiple nodes in different data centers, zones, regions, and cloud vendors. It is suitable for applications that need PostgreSQL behavior without changing their code. It uses a Raft consensus to distribute and replicate table rows, index entries, and transaction intents. The data placement can be controlled in geo-partitioning using PostgreSQL partitions and tablespaces.

Both databases illustrate the adaptability of relational systems to cloud-native infrastructure and showcase different strategies that merge high performance with horizontal scalability. AWS offers specialized database services designed for cloud-native applications, and many RDS Aurora services offer compatibility with PostgreSQL and features like multi-tenancy, elasticity, resilience, and multi-region support. YugabyteDB delivers these features within a converged data platform, taking advantage of cloud capabilities while providing cloud vendor flexibility for users.