Paxos Algorithm

2 min

Paxos is the predecessor of the Raft algorithm. Both algorithms have the same consensus mechanism: approval by a majority of nodes. Paxos is more difficult than Raft because Paxos runs in a peer-to-peer network without a leader. To understand Paxos well, I strongly recommend that you first understand the Raft algorithm.

A Raft node has three roles: Candidate, Leader, and Follower. The leader performs all requests and sorts the operation log. The followers receive the log and vote on the final consensus. The Candidate is only involved during leader elections (For more details on Raft, you can read my previous article.)

A Paxos node has three roles: Proposer, Acceptor, and Learner. The Proposer receives a request and creates it as a proposal. The Acceptor grants permission to the proposal and votes on the final consensus state in the next stage. The Learner saves the final state permanently.

The Paxos algorithm has three committing phases:

  1. Suggest a proposal.
  2. Reject or grant the proposal.
  3. Vote for consensus.

The Proposer makes a proposal ID and sends it to the Acceptor. Each proposal ID represents the node’s identity, and two IDs can be compared. If two Proposers create different proposals simultaneously, the proposal IDs would be:

  • Node A creates a proposal with the ID: [1, A]
  • Node B creates a proposal with the ID: [1, B]

These two IDs can be compared: [1, A] < [1, B]. In a peer-to-peer network running Paxos, there is no single leader to handle all requests and sort them. The comparable ID is used to determine which proposal can be performed first.

When an Acceptor grants permission to a proposal, e.g., [1, B], later proposals with IDs lower than [1, B] will be rejected. The Proposer will know the rejection and reassign a higher proposal ID to send to the Acceptor and obtain permission again.

The next phase is voting on the final state. Because of the peer-to-peer network, each node may hold a different proposal. They must vote for the unique proposal that is granted by a majority of nodes. Then the final consensus state will be sent to the Learner, and the Learner saves it permanently.

There is a trap that we must resolve. Suppose Acceptors A and B are granting two proposals. They might be like this:

A obtains permission for [1,A]
B obtains permission for [2,B]
A suggests [1,A] and is denied
A obtains permission for [3,A]
B suggests [2,B] and is denied
B obtains permission for [3,B]
A suggests [3,A] and is denied

… repeat infinitely …

This will enter an infinite loop. To resolve this problem, we can use a timeout retransmission algorithm like TCP.

The Paxos paper didn’t mention the details about error handling. If a node crashes, how do we continue reaching consensus? The leaderless peer-to-peer network adds more difficulties to development. Some Paxos variants improve error handling by adding a leader with an election algorithm (Raft is an excellent example). When the leader handles all requests, we can ensure the order of all requests. The remaining processing is to synchronize and vote on the sequence of operations, and then commit the final consensus state.


Note: This post was originally published on (One of our makers' personal blog)