Distributed consensus revised Howard, PhD thesis
Welcome back to a new term of The Morning Paper! To kick things off, I’m going to start by taking a look at Dr Howard’s PhD thesis, ‘Distributed consensus revised’. This is obviously longer than a standard paper, so we’ll break things down over a few days. As the title suggests, the topic in hand is distributed consensus:
Single-valued agreement is often overlooked in the literature as already solved or trivial and is seldom considered at length, despite being a vital component in distributed systems which is infamously poorly understood… we undertake an extensive examination of how to achieve consensus over a single value.
What makes this much harder than it might at first appear of course, is the possibility of failures and asynchronous communication. In the face of this, an algorithm for consensus must meet three safety requirements and two progress requirements:
- Non-triviality: the decided value must have been proposed by a participant (so for example, solutions which always choose a fixed pre-determined value are not acceptable)
- Safety: if a value has been decided, no other value will be decided
- Safe learning: if a participant learns a value, it must learn the decided value
- Progress: under a specified set of liveness conditions, if a value has been proposed by a participant then a value is eventually decided
- Eventual learning under a specified set of liveness conditions, if a value has been decided then a value is eventually learned
The first part of the thesis examines the Paxos family of algorithms, and is a great resource for navigating this complex space. A combination of 16 different lemma and theorems are used to prove that Paxos does provide the desired safety and progress guarantees. The proofs are carefully tied back to 10 different properties of the classic Paxos algorithm.
The surprising results of this approach are twofold: firstly, the proof of correctness did not use the full strength of the properties provided and secondly, there are many approaches which satisfy the same properties.
The second part of the thesis uses these insights to generalise Paxos, weakening the requirements for quorum intersection, phase completion, value selection, and epoch allocation. It helps us to tease apart what is essential to the problem of distributed consensus , and what is merely a design choice of a particular solution. The solution space that is opened up offers greater flexibility in performance and reliability trade-offs, new progress guarantees, and improved performance.
These are important results since as well as being notoriously difficult to understand and to implement, Paxos has some well-known and not-so-well-known limitations:
The reliance on majority agreement means that the Paxos algorithm is slow to reach decisions… systems are limited in scale, often to three or five participants, as each additional participant substantially decreases overall performance. It is widely understood that Paxos is unable to reach an agreement if the majority of participants have failed. However, this is only part of the overall picture, failure to reach agreement can result not only from unavailable hosts but also network partitions, slow hosts, network congestion, contention for resources such as persistent storage, clock skew, packet loss and countless other scenarios. Such issues are commonplace in some systems…
Let’s begin our journey by considering one of the simplest solutions possible…
Source/More: Distributed consensus revised – Part I | the morning paper