- Consensus is a way to build reliable distributed systems with unreliable components.
- Blockchain-based distributed systems aim to agree on an single history of transactions.
- Proof of work and proof of stake are not consensus protocols, but enable consensus protocols.
- Many blockchain consensus protocols are "forkful".
- Forkful chains use a fork choice rule, and sometimes undergo reorganisations.
- In a "safe" protocol, nothing bad ever happens.
- In a "live" protocol, something good always happens.
In this section we'll cover the basics of consensus, fork choice, and finality. Most of this section is not specific to Ethereum and is for general background understanding.
The Ethereum network comprises a large number of individual nodes. Each node acts independently, and nodes communicate over an unreliable, asynchronous network, the Internet. Any individual node might be honest – behaving correctly at all times – or faulty in any arbitrary way: simply down or non-communicative, following a different version of the protocol, actively trying to mislead other nodes, publishing contradictory messages, or any manner of other fault.
Users submit transactions to this network of nodes, and the goal of the consensus protocol is that all correct nodes eventually agree on a single, consistent view of the history of transactions. That is, the order in which transactions were processed and the outcome of that processing. So, if I have 1 ETH and I simultaneously tell the network that I am sending that 1 ETH to Alice and also to Bob, we expect that eventually the network will agree that either I sent it to Alice or I sent it to Bob. It would be a failure if both Alice and Bob received my Ether, or if neither received it.
A consensus protocol is the process by which this agreement on the ordering of transactions comes about.
The consensus protocol in Ethereum 2 actually "bolts together" two different consensus protocols. One is called Casper FFG, the other LMD GHOST. The combination has become known as Gasper. In subsequent pages we will be looking at these both separately and in combination.
We imagine that several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. The generals can communicate with one another only by messenger. After observing the enemy they must decide on a common plan of action.
This formulation makes clear that there is no overall holistic view, no God-mode in which we can see the whole situation in one glance and make a decision. We are simply one of the generals, and our only source of information about the other generals is the messages that we receive - messages that may be correct, or lies, or mistakes based on limited information, or delayed, or modified in transit. We have only a very limited local view, yet we must come to a view about the state of the whole system.
It is important to keep this in mind at all times. When we draw diagrams of block chains and block trees, it is easy to assume that this is somehow "the state" of the whole system. But these diagrams only ever represent the local view of a single participant in the system. My node's view of the system is likely to differ from your node's view of the system, if only temporarily, because we operate over an unreliable network. For example, you will see blocks at different times from when I see them, or in a different order, or even different blocks from those that I see.
Lamport captures the faultiness of the system in the following way.
However, some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.
These treacherous generals exhibit what we've come to call "Byzantine behaviour", or "Byzantine faults". They can act in any arbitrary way: delaying messages, reordering messages, outright lying, sending contradictory messages to different recipients, failing to respond at all, or any other behaviour we can think of.
The loyal generals need a method that reliably delivers an outcome on the following terms.
A. All loyal generals decide upon the same plan of action [e.g. "attack" or "retreat"], and
B. A small number of traitors cannot cause the loyal generals to adopt a bad plan.
Achieving consensus in such a Byzantine distributed system is not an easy problem to solve, but there have been several successful approaches over the years.
The first mainstream solution was the Practical Byzantine Fault Tolerance (PBFT) algorithm published by Liskov and Castro in 1999. This relies on a relatively small and limited set of known consensus participants (called replicas). PBFT is always "safe", in the terms discussed below and does not have forks.
Nakamoto consensus, invented by Satoshi Nakamoto for Bitcoin in 2008, takes a fundamentally different approach. Rather than limiting participants to a known set it uses proof of work to permissionlessly select a temporary leader for the consensus. Unlike PBFT, Nakamoto consensus allows forks and and is not formally "safe".
Many, many variants of these and other novel alternatives, such as the Avalanche family of protocols, have since sprung up. Section 7, Related Work, of the Avalanche white paper provides a good survey of the zoo of different consensus protocols currently in use in the blockchain world.
This is a good point at which to mention that neither proof of work nor proof of stake is a consensus protocol in itself. They are often (lazily) referred to as consensus protocols, but each is merely an enabler for consensus protocols.
For the main part, both proof of work and proof of stake are Sybil resistance mechanisms that place a cost on participating in the protocol. This prevents attackers from overwhelming the protocol at low or zero cost.
Nevertheless, both proof of work and proof of stake are often fairly tightly coupled, via the fork choice rule, to the consensus mechanisms that they support. They provide a useful way to assign a weight, or a score, to a chain of blocks: in proof of work, the total work done; in proof of stake, the amount of value that supports a particular chain.
Beyond these basic factors, both proof of work and proof of stake enable many kinds of different consensus protocols to be built on them, each with its own dynamics and trade-offs. Once again, the survey in section 7, Related Work, of the Avalanche white paper is instructive.
The basic primitive that underlies blockchain technology is, of course, the block.
A block comprises a set of transactions that a leader (the block proposer) has assembled. A block's contents (its payload) may vary according to the protocol.
- The payload of a block on Ethereum's proof of work chain is an ordered list of user transactions.
- The payload of a block on the pre-Merge proof of stake beacon chain is (mostly) a set of attestations made by other validators.
- As and when EIP-4844 is implemented on Ethereum then blocks will contain opaque blobs of data alongside the ordered list of user transactions.
With the exception of the special Genesis block, every block builds on and points to a parent block. Thus we end up with a chain of blocks: a blockchain. Whatever the contents of blocks, the goal of the protocol is for all nodes on the network to agree on the same history of the blockchain.
A blockchain. Time moves from left to right and, except for the Genesis block, each block points to the parent block it builds on.
The chain grows as nodes add their blocks to its tip. This is accomplished by temporarily selecting a "leader", an individual node that has the right to extend the chain. In proof of work the leader is the miner that first solves the proof of work puzzle for its block. In Ethereum's proof of stake the leader is selected pseudo-randomly from the pool of active stakers.
The leader (usually known as the block proposer) adds a single block to the chain, and has full responsibility for selecting and ordering the contents of that block.
The use of blocks is an optimisation. Each addition to the chain could in principle be a single transaction, but that would add a huge consensus overhead. So blocks are batches of transactions, and sometimes people argue about how big those blocks should be. In Bitcoin, the block size is limited by the number of bytes of data in the block. In Ethereum's proof of work chain, the block size is limited by the block gas limit (that is, the amount of work needed to run the transactions in the block). Beacon block sizes are limited by hard-coded constants.
Our neat diagram of a nice linear chain will for the most part reflect what we see in practice, but not always. Sometimes, due perhaps to network delays, or a dishonest block proposer, or client bugs, any particular node might see something more like the following.
In general we might end up with a block tree rather than a block chain. Again, time moves from left to right and each block points to the parent block it builds on.
In real networks we can end up with something more like a block tree than a block chain. In this example very few blocks are built on their "obvious" parent.
Why did the proposer of block build on rather than ?
- It may be that the proposer of had not received block by the time it was ready to make its proposal.
- It may be that the proposer of deliberately wanted to exclude block from its chain, for example to steal its transactions, or to censor some transaction in .
- It may be that the proposer of thought that block was invalid for some reason.
The first two reasons, at least, are indistinguishable to the wider network. All we know is that built on , and we can never know why for certain.
Similarly, why did the proposer of block build on rather than ? Any of the above reasons apply, and we can add another:
- the proposer of may have decided on some basis that there was more chance of the wider network eventually including than . Thus, building on gives it more chance of making it into the eventual block chain, than building on .
The various branches in the block tree are called "forks". Forks happen naturally as a consequence of network and processing delays, but they can also occur due to client faults, malicious client behaviour, or protocol upgrades that change the rules so as to make old blocks invalid with respect to the new rules. The last of these is often called a "hard fork".
The existence of forking in a consensus protocol is a consequence of prioritising liveness over safety, in the terms discussed below: if you were to consult nodes that are following different forks they would give you different answers regarding the state of the system. Non-forking consensus protocols exist, such as PBFT in the classical consensus world and Tendermint in the blockchain world. These protocols always produce a single linear chain and are thus formally "safe". However, they sacrifice liveness on asynchronous networks such as the Internet: rather than forking, they just stop entirely.
Ultimately, we want every correct node on the network to converge on an identical linear view of history and hence a common view of the state of the system. This convergence is brought about by means of the protocol's fork choice rule.
Given a block tree and some decision criteria based on a node's local view of the network, the fork choice rule is designed to select, from all the available branches, the one that is most likely to eventually end up in the final linear, canonical chain. That is, it will choose the branch least likely to be later pruned out of the block tree as nodes attempt to converge on a canonical view.
The fork choice rule selects a head block from among the candidates. This identifies a unique linear block chain running back to the Genesis block.
The fork choice rule selects a branch implicitly by choosing a block at the tip of a branch, called the head block.
For any correct node, the first criterion for any fork choice rule is that the block it chooses must be valid according to the protocol's rules, and all its ancestors must be valid. Any invalid block is ignored, and any blocks built on an invalid block are themselves invalid.
Given that, there are many examples of different fork choice rules.
- The proof of work protocols in Ethereum and Bitcoin use a "heaviest chain rule"1 (sometimes called "longest chain", though that's not strictly accurate). The head block is the tip of the chain that represents the most cumulative "work" done under proof of work.
- The fork choice rule in Ethereum's proof of stake Casper FFG protocol is "follow the chain containing the justified checkpoint of the greatest height", and to never revert a finalised block.
- The fork choice rule in Ethereum's proof of stake LMD GHOST protocol is specified in its name: take the "Greediest Heaviest Observed SubTree". It involves counting accumulated votes from validators for blocks and their descendent blocks. It also applies the same rule as Casper FFG.
We will properly unpack the second and third of these later in their respective sections.
You can perhaps see that each of these fork choice rules is a way to assign a numeric score to a block. The winning block, the head block, has the highest score. The idea is that all correct nodes, when they eventually see a certain block, will unambiguously agree that it is the head and choose to follow its branch whatever else is going on in their own views of the network. Thus, all correct nodes will eventually converge on a common view of a single canonical chain going back to genesis.
As a node receives new blocks (and, under proof of stake, new votes for blocks) it will re-evaluate the fork choice rule in the light of the new information. Most commonly, a new block will be a child of the block that it currently views as the head block. In this case the new block automatically becomes the updated head block (as long as it is valid).
However, sometimes the new block might be a descendent of some other block in the block tree. (Note that, if the node doesn't already have the parent block of the new block, it will need to ask its peers for it, and so on for any blocks it knows that it is missing.)
In any case, running the fork choice rule on the updated block tree might indicate a head block that is on a different branch from the previous head block. When this happens, the node must perform a reorg (short for reorganisation), also known as a reversion. It will kick out (revert) blocks that it had previously included in its chain, and will adopt the blocks on the hew head's branch.
In the following diagram, the node has evaluated block to be the head block, hence its chain comprises blocks and . The node knows about block , but it does not appear in its view of the chain; it is on a side branch.
At this point, the node believes that block is the best head, and therefore its chain is blocks .
Some time later the node receives block which is not built on its current head block , but on block on a different branch. Depending on the details of the fork choice rule, the node might still evaluate to be a better head than and therefore ignore . But in this case we will imagine that the fork choice rule indicates that is the better head block.
Blocks , , and are not ancestors of , so they need to be removed from the node's canonical chain. Any transactions or information those blocks contain must be reverted, as if they were never received. The node must perform a full rewind to the state that it was in after processing block .
After rewinding to , the node can add blocks and to its chain and process them accordingly. After doing this, the node will have completed the reorganisation of its chain.
Now the node believes that block is the best head, and therefore its chain must change to blocks .
Later, perhaps, a block might appear that builds on . If the fork choice rule indicates that ought to be the new head, then the node will perform a reorg once again, reverting blocks back to and replaying the blocks on 's branch.
Short reorgs of one or two blocks in both proof of work and Ethereum's proof of stake protocol are not uncommon due to network delays in block propagation. Much longer reorgs ought to be exceedingly rare, unless the chain is under attack, or there is a bug in the formulation of – or the clients' implementations of – the fork choice rule.
Two important concepts that crop up frequently when discussing consensus mechanisms are safety and liveness.
Informally, an algorithm is said to be safe if "nothing bad ever happens"2.
Examples of bad things that might happen in the blockchain context could be the double-spend of a coin, or the finalising of two conflicting checkpoints.
An important facet of safety in a distributed system is "consistency". That is, if we were to ask different (honest) nodes about the state of the chain at some point in its progress, such as the balance of an account at a particular block height, then we should always get the same answer, no matter which node we ask. In a safe system, every node has an identical view of the history of the chain that never changes.
Effectively, safety means that our distributed system "behaves like a centralized implementation that executes operations atomically one at a time." (to quote Castro and Liskov). A safe system is, in Vitalik's taxonomy of centralisation, logically centralised.
Again informally, an algorithm is said to be live if "something good eventually happens".
In a blockchain context we generally understand this to mean that the chain can always add a new block; it will never get into a deadlock situation in which it will not produce a new block with transactions in it.
"Availability" is another way of looking at this. I want the chain to be available, meaning that if I send a valid transaction to an honest node it will eventually be included in a block that extends the chain.
The CAP theorem is a famous result in distributed systems theory that states that no distributed system can provide all three of (1) consistency, (2) availability, and (3) partition tolerance. Partition tolerance is the ability to function when communication between nodes is not reliable. For example, a network fault might split the nodes into two or more groups that can't communicate with each other.
It is easy to demonstrate the CAP theorem in our blockchain context. Imagine that Amazon Web Services goes offline, such that all the AWS hosted nodes can communicate with each other, but none can can talk to the outside world. Or that a country firewalls all connections in and out so that no gossip traffic can pass. Either of these scenarios divide the nodes into two disjoint groups, and .
The network is partitioned: the nodes in can talk among themselves, but cannot talk to any node in , and vice versa.
Let's say that somebody connected to the network of group sends a transaction. If the nodes in process that transaction then they will end up with a state that is different from the nodes in group , which didn't see the transaction. So, overall, we have lost consistency between all the nodes, and therefore safety. The only way to avoid this is for the nodes in group to to refuse to process the transaction, in which case we have lost availability, and therefore liveness.
In summary, the CAP theorem means that we cannot hope to design a consensus protocol that is both safe and live under all circumstances, since we have no option but to operate across an unreliable network, the Internet.3
The Ethereum 2 consensus protocol prioritises liveness: in the case of a network partition the nodes on each side of the partition will continue to produce blocks. However, finality (a safety property) will no longer occur on both sides of the partition. Depending on the proportion of stake managed by each side, either one side or neither side will continue to finalise.
Eventually, unless the partition is resolved, both sides will regain finality due to the novel inactivity leak mechanism. But this results in the ultimate safety failure. Each chain will finalise a different history and will become irreconcilable and independent forever.
It's worth noting that typical proof of work based algorithms also prioritise liveness over safety. In fact, Bitcoin and Ethereum's proof of work offer no safety guarantee at all; they have no concept of finality. At any time somebody might reveal a heavier chain that rewrites history. Even under non-adversarial conditions, minor forks happen frequently and there is no guarantee that different nodes you will give you the same answers. Exchanges typically use a proxy for safety that requires waiting for a certain number of blocks to be built on top of a transaction before it is considered final, but that's only a statistical guarantee, and is no guarantee at all in the face of a 51% attack.4
Ethereum's proof of stake mechanism prioritises liveness, but unlike proof of work it also strives to offer a safety guarantee under favourable circumstances.
Safety in Ethereum 2 is called "finality", and is delivered by the Casper FFG mechanism that we'll explore shortly. The idea is that, as the blockchain progresses, all honest nodes agree on blocks that they will never revert. That block (a checkpoint) and all its ancestor blocks are then "final" - they will never change, and if you consult any honest node in the network about them or their ancestors you will always get the same answer. Thus, finality is a safety property: nothing bad ever happens.
The honest nodes have agreed that the checkpoint and all its ancestor blocks are "final" and will never be reverted. There are therefore no forks before the checkpoint. The chain descending from the checkpoint remains liable to forking.
Finality in Ethereum 2 is "economic finality". It is theoretically possible for the protocol to finalise two conflicting checkpoints, that is, two contradictory views of the chain's history. However, it is possible only at enormous and quantifiable cost. For all but the most extreme attack or failure scenarios, final means final.
The next section, on Casper FFG, dives into the detail of how this finality mechanism works.
It's always worth reading anything that Lamport has had a hand in, and the original paper by Lamport, Shostak, and Pease on The Byzantine Generals Problem contains many insights. While the algorithm they propose is hopelessly inefficient in modern terms, the paper is a good introduction to reasoning about consensus protocols in general. The same is true of Castro and Liskov's seminal paper Practical Byzantine Fault Tolerance which significantly influences the design of Ethereum's Casper FFG protocol. However, you might like to contrast these "classical" approaches with the elegant simplicity of proof of work, as devised by Satoshi Nakamoto and described in the Bitcoin white paper. If proof of work has just one thing in its favour, it is its simplicity.
We've referred above to Gilbert and Lynch's 2012 paper, Perspectives on the CAP Theorem. It is a very readable exploration of the concepts of consistency and availability (or safety and liveness in our context).
The Eth2 beacon chain underwent a seven block reorg in May 2022 due to differences between client implementations of the fork choice rule. These differences were known at the time and thought to be harmless. That proved to be not so. Barnabé Monnot's write up of the incident is very instructive.
Vitalik's blog post On Settlement Finality provides a deeper and more nuanced exploration of the concept of finality.
Our ideal for the systems we are building is that they are politically decentralised (for permissionlessness and censorship resistance), architecturally decentralised (for resilience, with no single point of failure), but logically centralised (so that they give consistent results). These design criteria strongly influence how we build our consensus protocols. Vitalik explores these issues in his article, The Meaning of Decentralization.
- Contrary to popular belief, Ethereum's proof of work protocol does not use any form of GHOST in its fork choice. I really don't know why this misconception is so persistent - I eventually asked Vitalik about it and he confirmed to me (verbally) that although GHOST had been planned under PoW it was never implemented due to concerns about some unspecified attacks. The heaviest chain rule was simpler and well tested. It has served us well.↩
- The helpful, intuitive definitions of safety and liveness I've quoted appear in short form in Lamport's 1977 paper, Proving the Correctness of Multiprocess Programs, and as stated here in Gilbert and Lynch's 2012 paper, Perspectives on the CAP Theorem.↩
- The CAP theorem is related to another famous result described by Fisher, Lynch and Paterson in their 1985 paper, Impossibility of Distributed Consensus with One Faulty Process, usually called the FLP theorem. This proves that, even in a reliable asynchronous network (that is, with no bound on how long messages can take to be received), just one faulty node can prevent the system from coming to consensus. That is, even this unpartitioned system cannot be both live and safe. Gilbert and Lynch's paper discusses the FLP theorem in section 3.2.↩
- At the time of writing, at least one exchange requires 40000 confirmations for deposits from the Ethereum Classic network. That means that forty thousand blocks must be built on top of a block containing the deposit transaction before the exchange will process it, which takes about six days. The requirement reflects concern about the vulnerability of ETC's low hash rate proof of work chain to 51% attacks - it is relatively easy for an attacker to revert blocks at will. The reality is that, in the face of a well-crafted 51% attack, no number of confirmations is truly safe.↩