Raft
Raft is a consensus algorithm designed to be understandable. It solves the same problem as Paxos — getting a cluster of machines to agree on a sequence of operations — but decomposes the problem into three subproblems with clear interfaces.
The three subproblems
- Leader election — pick one node to coordinate.
- Log replication — leader appends entries to followers.
- Safety — never let two leaders commit conflicting entries.
stateDiagram-v2
[*] --> Follower
Follower --> Candidate: election timeout
Candidate --> Leader: receives majority votes
Candidate --> Follower: discovers higher term
Leader --> Follower: discovers higher term
Candidate --> Candidate: split vote, restart
Terms and elections
Time is divided into terms, each a monotonically increasing integer. Each term begins with an election. A candidate wins if it gets a majority of votes for its term. If no one wins (split vote), a new term starts.
A node grants its vote if:
- the candidate’s term its own term, and
- the candidate’s log is at least as up-to-date as its own.
Log replication
The leader is the only node that accepts writes. Each entry has a (term, index).
Replication proceeds in AppendEntries RPCs. An entry is committed once
it’s stored on a majority of nodes.
| Property | Guarantee |
|---|---|
| Election Safety | At most one leader per term |
| Leader Append-Only | Leader never overwrites or deletes its own log entries |
| Log Matching | If two logs have an entry with same index+term, all prior entries match |
| Leader Completeness | Committed entries appear in every leader of higher terms |
| State Machine Safety | Same index+term ⇒ same command applied |
Why Raft over paxos
The Paxos paper is famously hard to read. Raft’s contribution isn’t novelty in safety properties — it’s pedagogical clarity. In practice both work; Raft just has fewer ways to misimplement.
Related
- paxos — the classical consensus algorithm
- cap-theorem — Raft is a CP system; minority partitions can’t make progress