Inside Amazon Aurora: Innovations in Distributed Database Design

Posted in distributed-systems by Christopher R. Wirz on Thu Sep 19 2024



Amazon Aurora is a cloud-native relational database that takes a novel approach to distributed systems design. One of Aurora's core design principles is avoiding distributed consensus algorithms like Paxos or two-phase commit for most operations. Instead, Aurora uses clever techniques to maintain consistency using local state and asynchronous operations:

  1. Quorum-based writes: Aurora uses a quorum model where writes are considered durable once acknowledged by a majority of storage nodes. This avoids the need for synchronous consensus.

  2. Local consistency points: The database tracks consistency points like Volume Complete LSN (VCL) locally based on acknowledgments from storage nodes. This allows commits to be processed without distributed agreement.

  3. Asynchronous writes: All writes, including commits, are sent asynchronously to storage nodes. This improves performance by avoiding synchronous network round trips.

  4. Crash recovery: After a crash, the database reconstructs consistency points from storage node state, avoiding the need to maintain this state distributedly.

Efficient Reads

Aurora also employs techniques to make reads more efficient in a distributed environment:

  1. Avoiding quorum reads: The database tracks which storage nodes have the latest version of data, allowing it to read from a single node instead of a quorum.

  2. Read replicas: Replicas share the same underlying storage as the primary, allowing quick setup and teardown of read capacity.

  3. Structural consistency: Careful ordering of updates ensures replicas maintain structural consistency of data structures like B-trees.

  4. Snapshot isolation: Read views are anchored to consistency points to provide transactional isolation across distributed nodes.

Flexible Quorum Management

Aurora uses some innovative approaches to quorum management:

  1. Quorum sets: Membership changes are done through a series of reversible transitions using Boolean logic to define valid quorums. This allows non-blocking, fault-tolerant membership updates.

  2. Epochs: Monotonically increasing epochs are used to quickly detect and reject stale operations during membership changes.

  3. Heterogeneous quorums: Aurora uses "full" and "tail" segments in its quorums to optimize for cost while maintaining availability.

The Key Insight: Leveraging Local State

The overarching theme is Aurora's clever use of local, ephemeral state to avoid expensive distributed consensus in the common case. By tracking consistency points locally and reconstructing them after failures, Aurora achieves high performance while maintaining strong consistency guarantees.

This approach demonstrates that sometimes the best way to solve distributed systems challenges is to avoid distribution where possible, falling back to consensus protocols only when absolutely necessary.

Key Concepts

  • Quorum Model:

    • A system where operations are considered successful when acknowledged by a majority of nodes.
    • Aurora uses a 4/6 write quorum and a 3/6 read quorum across 6 storage nodes.

  • Consistency Points:

    • Local markers that track the progress of writes across the distributed system.
    • Include concepts like Segment Complete LSN (SCL), Protection Group Complete LSN (PGCL), and Volume Complete LSN (VCL).

  • Asynchronous Writes:

    • All write operations, including commits, are sent asynchronously to storage nodes.
    • Improves performance by avoiding synchronous network round trips.

  • Log-Structured Storage:

    • Only redo log records are sent from the database to storage nodes.
    • Storage nodes are responsible for applying these logs to materialize data blocks.

  • Read Replicas:

    • Replicas that share the same underlying storage as the primary instance.
    • Allow quick scaling of read capacity without data copying.

  • Quorum Sets:

    • A technique for changing quorum membership through a series of reversible transitions.
    • Allows non-blocking, fault-tolerant updates to the quorum.

  • Epochs:

    • Monotonically increasing numbers used to detect and reject stale operations.
    • Used for volume, membership, and geometry changes.

  • Heterogeneous Quorums:

    • Use of different types of segments ("full" and "tail") in quorums to optimize for cost and performance.

  • Local State Leverage:

    • Extensive use of local, ephemeral state to avoid distributed consensus in common operations.
    • State is reconstructed after failures rather than maintained distributedly.

  • Structural Consistency:

    • Techniques to ensure replicas maintain consistent views of data structures like B-trees.

  • Snapshot Isolation:

    • A method to provide transactional isolation across distributed nodes using consistency points.

@inproceedings{verbitski2018amazon,
	title={Amazon aurora: On avoiding distributed consensus for i/os, commits, and membership changes},
	author={Verbitski, Alexandre and Gupta, Anurag and Saha, Debanjan and Corey, James and Gupta, Kamal and Brahmadesam, Murali and Mittal, Raman and Krishnamurthy, Sailesh and Maurice, Sandor and Kharatishvilli, Tengiz and others},
	booktitle={Proceedings of the 2018 International Conference on Management of Data},
	pages={789--796},
	year={2018}
}