Consensus Protocols in Distributed Systems: From Logical Clocks to Raft

Posted in distributed-systems by Christopher R. Wirz on Wed Sep 18 2024



The field of consensus protocols continues to evolve, with ongoing research and development aimed at improving performance, reliability, and ease of implementation. As distributed systems become increasingly prevalent, understanding these protocols and their trade-offs is crucial for designing robust and scalable systems. Whether you are working with established solutions or developing new ones, the principles behind these consensus protocols provide valuable insights into solving the challenges of distributed computing.

Logical Clocks: The Foundation

Before diving into consensus protocols, it is important to understand the concept of logical clocks, introduced by Leslie Lamport in 1987. Logical clocks address the challenge of ordering events in a distributed system where physical clock synchronization is unreliable.

Key points about logical clocks:

  • They define a partial ordering of events using the "happens before" relation
  • Each event is assigned a Lamport timestamp
  • Timestamps are updated based on local events and message passing
  • They ensure that for any two causally related events, the sending event's timestamp is always smaller than the receiving event's timestamp

While not a consensus protocol itself, logical clocks laid the groundwork for future developments in distributed systems.

Replicated State Machines

Consensus protocols are often used in the context of replicated state machines. This approach:

  • Maintains a consistent log across multiple instances in a distributed system
  • Allows each instance to replay commands in the same order
  • Ensures consistent data reads across all replicas

The core of this system is the Consensus module, which is where protocols like Paxos, ZAB, and Raft come into play.

Paxos: The Theoretical Foundation

Developed by Lamport in the 1990s, Paxos is a consensus protocol that has become the theoretical foundation for many distributed systems. However, it is often considered difficult to understand and implement.

Key aspects of Paxos:

  • Designed for single-event consensus
  • Involves roles like Proposer, Acceptor, and Learner
  • Uses a two-phase commit process: Prepare and Accept
  • Allows multiple Proposers, which can lead to livelock in some scenarios

While Paxos provides a strong theoretical base, it requires significant modifications for practical use in systems like replicated state machines.

Multi-Paxos: Bridging Theory and Practice

Multi-Paxos addresses some of the limitations of basic Paxos:

  • Allows for continuous log replication
  • Introduces a single leader to avoid livelock
  • Ensures all replicas have consistent logs

These modifications make Multi-Paxos more suitable for real-world applications, but implementing it still requires significant engineering effort.

ZAB: ZooKeeper Atomic Broadcast

Developed alongside Apache ZooKeeper, ZAB is a consensus protocol designed to meet specific requirements not addressed by Paxos at the time.

Key features of ZAB:

  • Ensures FIFO execution of client commands
  • Handles leader election and recovery
  • Uses a two-phase commit process for writes
  • Supports stale reads from replicas with a "sync" operation for strong consistency

While effective, ZAB is tightly coupled with ZooKeeper and has not seen widespread adoption outside of that ecosystem.

Raft: Designed for Understandability

Raft, developed in 2014, aims to be more understandable and easier to implement than Paxos while still providing strong consistency guarantees.

Key aspects of Raft:

  • Separates leader election from log replication
  • Uses a term-based approach for leadership and log entries
  • Implements a simpler leader election process
  • Uses AppendEntries RPC for both log replication and heartbeats
  • Allows for log entry modification, unlike append-only approaches

Raft has gained popularity due to its relative simplicity and has been adopted in various distributed systems.

Comparing the Protocols

Each consensus protocol has its strengths and trade-offs:

  • Paxos provides a strong theoretical foundation but is complex to implement
  • ZAB is tailored for ZooKeeper's specific needs
  • Raft prioritizes understandability and ease of implementation

Recent developments have seen optimizations to these protocols, such as Parallel-Raft, which aims to improve performance by allowing parallel commits.

Key Concepts

  • Distributed System A network of independent computers that appear to users as a single coherent system, working together to accomplish shared tasks.
  • Consensus The process by which a group of nodes in a distributed system agree on a single data value or state, ensuring consistency across the system.
  • Logical Clock A mechanism for assigning a partial ordering to events in a distributed system, allowing for the determination of causal relationships between events.
  • Lamport Timestamp A simple logical clock implementation where each event is assigned a number (timestamp), and these numbers are used to order events across different processes in a distributed system.
  • Happens-Before Relation A partial ordering of events in a distributed system, where one event is said to "happen before" another if it could have causally influenced the other.
  • Replicated State Machine A technique for implementing fault-tolerant services by replicating a deterministic state machine across multiple servers, ensuring that all replicas process the same sequence of commands.
  • Consensus Module The core component in a replicated state machine responsible for ensuring all replicas agree on the sequence of commands to be executed.
  • Paxos A consensus protocol designed to ensure agreement among a network of unreliable processors, often considered the foundational algorithm for fault-tolerant distributed systems.
  • Proposer (in Paxos) A role in the Paxos protocol that suggests values to be agreed upon by the group.
  • Acceptor (in Paxos) A role in the Paxos protocol that accepts or rejects proposals from Proposers.
  • Learner (in Paxos) A role in the Paxos protocol that learns and records the agreed-upon values.
  • Two-Phase Commit A distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction.
  • Multi-Paxos An optimization of the Paxos protocol that allows for more efficient consensus on a series of values by reducing the number of message rounds required.
  • Livelock A situation in distributed systems where individual components are actively operating but the system as a whole fails to make progress.
  • ZAB (ZooKeeper Atomic Broadcast) A consensus protocol developed for Apache ZooKeeper, designed to maintain consistency in a distributed system and handle leader election and recovery.
  • Epoch (in ZAB) A monotonically increasing number associated with each leader in ZAB, similar to the "term" concept in Raft.
  • Raft A consensus algorithm designed to be more understandable than Paxos, which separates key elements like leader election and log replication for clarity.
  • Term (in Raft) A logical time period in the Raft protocol, used to detect obsolete information and used in every Raft communication.
  • Leader Election The process by which nodes in a distributed system select a single node to act as the coordinator or primary decision-maker.
  • Follower (in Raft and ZAB) A passive role that responds to requests from leaders and candidates but does not initiate any actions on its own.
  • Candidate (in Raft) A temporary role assumed by a follower when it believes the leader has failed and initiates an election.
  • Log Replication The process of copying and synchronizing a log of commands or state changes across multiple nodes in a distributed system.
  • AppendEntries RPC (in Raft) A remote procedure call used in Raft for both log replication and as a heartbeat mechanism.
  • Committed Entry A log entry that has been acknowledged by a majority of nodes in the cluster and is safe to apply to the state machine.
  • Parallel-Raft An optimization of the Raft protocol that allows for parallel commit of log entries, improving performance in certain scenarios.
@misc{alibaba2019consensus,
	title = {A Brief Analysis of Consensus Protocol: From Logical Clock to Raft},
	author = {{Alibaba Clouder - Alibaba Cloud Community}},
	year = {2019},
	url = {https://www.alibabacloud.com/blog/a-brief-analysis-of-consensus-protocol-from-logical-clock-to-raft_594675},
	note = {Accessed: 2024-09-09}
}