Raft is a simple consensus algorithm designed based on these principles:
- Dividing a problem into separate pieces that can be solved easily.
- Majority acceptance to avoid inconsistency.
- Simple logic controls safety.
The most valuable aspect of Raft is not only the algorithm itself but also the method of dividing problems. In this article, I will try to explain how it works.
In a distributed system, the simplest way to keep data consistent is to use a strong leader and replicate every write operation to other servers. The operation can be ordered by time as a log, appending each log entry sequentially. The leader has the authority to order the appending, and the followers copy the same order of log entries in their own logs. For further simplicity, once the log entries have been committed by the majority of servers, they are never overwritten. We just need to consider two problems:
- Who can become the leader?
- When the leader crashes, how do we handle inconsistent states?
Raft elections are similar to real elections, electing a leader through the majority acceptance and defining three roles for each server:
- Follower
- Candidate
- Leader
Each role is exclusive, and the transitions are arranged to serve the election process well:
- Every server is initialized as a Follower.
- A Follower can transition to a Candidate.
- A Candidate can be elected as Leader or revert to a Follower if the election fails.
- A Leader can only become a Follower if it crashes.
When a Follower receives no communication from the Leader over a period of time, the Follower becomes a Candidate and starts an election to request votes. If the Candidate gets the majority of votes, it becomes the new Leader. If one election term elects two leaders, a new election is started again. To prevent this situation, the re-election timeout is set randomly.
Data writing is separated into two phases: syncing the log entry and writing the state. When the Leader receives a request from a client, it first sends the AppendEntries request to Followers. After a majority of servers have received the entries, the Leader then notifies Followers to apply the log entry. Once the majority of servers have applied the log entry and produced the same result, the log has been committed safely. Finally, the Leader responds to the client to confirm that the write operation has been committed successfully.
Majority acceptance is the key in the Raft system. The 2PC (Two-Phase Commit) also relies on majority acceptance:
- Ensuring the majority of servers have received the log entries.
- Ensuring the majority of servers have committed the state write.
If a Leader crashes, a new election will start. At this time, a Follower becoming a Candidate has a key restriction:
- It must have the same committed log entries as the crashed Leader. Raft uses the voting process to prevent a Candidate from winning an election unless its log contains all committed entries.
If a Follower crashes, it duplicates the Leader’s log to keep consistent.
A Follower may receive the AppendEntries RPC from the crashed Leader and then vote for another Candidate. This situation is problematic; Raft uses logic to ensure this situation does not happen, as it has two contradictions:
- The Candidate’s log is shorter than the Follower’s.
- The Candidate’s log is longer than the Follower’s.
These two situations never exist. A Follower can become a Candidate based on the condition that its committed log is equal to the Leader’s, and it is never shorter or longer than the Leader’s.
Raft uses these restrictions to avoid issues:
- The Leader determines the order of appending.
- Log entries only have the append operation.
- Committed logs cannot be overwritten.
- Election timeout is randomized to prevent two leaders from winning the election.
- Follower -> Candidate -> Leader -> Follower: These transitions make leader elections easy.
- Majority acceptance (win election, commit log, become a Candidate).
- A Follower’s log can be forced to duplicate from the Leader’s when conflicts occur.
- 2PC committing, with each step accepted by a majority of servers.
- A Candidate must have a committed log equal to the Leader’s.
References
Note: This post was originally published on liyafu.com (One of our makers' personal blog)
Nowadays, we spend most of my time building softwares. This means less time writing. Building softwares has become my default way of online expression. Currently, we are working on Slippod, a privacy-first desktop note-taking app and TextPixie, a tool to transform text including translation and extraction.