Systems

Raft

Raft is a consensus algorithm designed to be understandable. It solves the same problem as Paxos — getting a cluster of machines to agree on a sequence of operations — but decomposes the problem into three subproblems with clear interfaces.

The three subproblems

  1. Leader election — pick one node to coordinate.
  2. Log replication — leader appends entries to followers.
  3. Safety — never let two leaders commit conflicting entries.
stateDiagram-v2
    [*] --> Follower
    Follower --> Candidate: election timeout
    Candidate --> Leader: receives majority votes
    Candidate --> Follower: discovers higher term
    Leader --> Follower: discovers higher term
    Candidate --> Candidate: split vote, restart

Terms and elections

Time is divided into terms, each a monotonically increasing integer. Each term begins with an election. A candidate wins if it gets a majority of votes for its term. If no one wins (split vote), a new term starts.

A node grants its vote if:

  • the candidate’s term \geq its own term, and
  • the candidate’s log is at least as up-to-date as its own.

Log replication

The leader is the only node that accepts writes. Each entry has a (term, index). Replication proceeds in AppendEntries RPCs. An entry is committed once it’s stored on a majority of nodes.

PropertyGuarantee
Election SafetyAt most one leader per term
Leader Append-OnlyLeader never overwrites or deletes its own log entries
Log MatchingIf two logs have an entry with same index+term, all prior entries match
Leader CompletenessCommitted entries appear in every leader of higher terms
State Machine SafetySame index+term ⇒ same command applied

Why Raft over paxos

The Paxos paper is famously hard to read. Raft’s contribution isn’t novelty in safety properties — it’s pedagogical clarity. In practice both work; Raft just has fewer ways to misimplement.

  • paxos — the classical consensus algorithm
  • cap-theorem — Raft is a CP system; minority partitions can’t make progress