🗳️ Raft Consensus
Distributed consensus via leader election and log replication
Overview
Raft is a consensus algorithm designed to be understandable. It ensures a cluster of nodes agrees on a sequence of commands even when some nodes fail. Unlike Paxos, Raft decomposes consensus into three sub-problems: leader election, log replication, and safety.
This implementation simulates a full Raft cluster with deterministic in-memory message passing, including network partition scenarios for testing fault tolerance.
Concepts Demonstrated
- Record types — rich message types (AppendEntries, RequestVote) with named fields
- Variant types — node roles (Follower | Candidate | Leader) and message types
- Mutable state with records — per-node state (term, log, commit index) updated via ref cells
- Pattern matching on messages — dispatching RPCs to handler functions
- Functional simulation — deterministic network model for reproducible testing
- List operations — log manipulation using immutable lists with List.nth, List.length
How Raft Works
(* Node roles *)
type role = Follower | Candidate | Leader
(* Each node maintains: *)
type node_state = {
mutable current_term : int; (* latest term seen *)
mutable voted_for : int option; (* candidate voted for *)
mutable log : log_entry list; (* command log *)
mutable commit_index : int; (* highest committed entry *)
mutable role : role;
...
}
(* Message types mirror the Raft paper *)
type message =
| AppendEntries of append_entries_req
| AppendEntriesResp of append_entries_resp
| RequestVote of request_vote_req
| RequestVoteResp of request_vote_resp
| ClientRequest of string
(* Leader election: candidate requests votes *)
(* Log replication: leader sends AppendEntries *)
(* Commitment: entry committed when replicated to majority *)
Key Protocol Phases
| Phase | Trigger | Description |
|---|---|---|
| Leader Election | Election timeout expires | Follower becomes candidate, increments term, requests votes from peers. Wins with majority. |
| Log Replication | Client submits command | Leader appends to local log, sends AppendEntries RPCs to all followers. |
| Commitment | Majority acknowledgment | Once a majority of nodes have replicated an entry, leader marks it committed. |
| Term Transition | Higher term discovered | Any node seeing a higher term steps down to follower and updates its term. |
| Network Partition | Simulated link failure | Partitioned nodes elect a new leader; old leader steps down on reconnection. |
Safety Properties
- Election Safety: At most one leader per term
- Leader Append-Only: Leader never overwrites or deletes log entries
- Log Matching: If two logs have an entry with the same index and term, all preceding entries are identical
- Leader Completeness: A committed entry appears in all future leaders' logs
- State Machine Safety: If a node applies an entry at index i, no other node applies a different entry at i
Running
# Compile and run the simulation
ocamlfind ocamlopt -package unix -linkpkg raft.ml -o raft && ./raft
# Or with bytecode
ocamlfind ocamlc -package unix -linkpkg raft.ml -o raft && ./raft
Further Reading
- Raft Visualization — interactive Raft simulator
- In Search of an Understandable Consensus Algorithm — the original Raft paper (Ongaro & Ousterhout, 2014)
- Actor Model — related concurrency pattern in this repo
- Source code