Part 2: Technical Overview

Consensus

Preliminaries

  • Consensus is a way to build reliable distributed systems with unreliable components.
  • Blockchain-based distributed systems aim to agree on a single history of transactions.
  • Proof of work and proof of stake are not consensus protocols, but enable consensus protocols.
  • Many blockchain consensus protocols are "forkful".
  • Forkful chains use a fork choice rule, and sometimes undergo reorganisations.
  • In a "safe" protocol, nothing bad ever happens.
  • In a "live" protocol, something good always happens.
  • No practical protocol can be always safe and always live.

Introduction

This section covers the basics of consensus, fork choice, and finality. Most of it is not specific to Ethereum and is for general background understanding.

The challenge a consensus protocol seeks to solve is that of building a reliable distributed system on top of unreliable infrastructure. Consensus protocol research goes back to the 1970s and beyond, but the scale of the challenges we seek to solve in Ethereum are orders of magnitude more ambitious.

Our goal in Ethereum's consensus layer is to enable tens of thousands of independent nodes around the world to proceed completely in lockstep with each other. Each node maintains a ledger containing the state of every account, and every ledger must match every other ledger. There must be no discrepancies; the nodes must agree, and they must come to agreement swiftly. This is what I mean by "a reliable distributed system".

These nodes often run on consumer grade hardware. They communicate over Internet connections that might be low bandwidth, or high latency, that lose packets, or drop out for indefinite periods of time. Node operators sometimes misconfigure their software, or don't keep it up to date. And, to make it all the more exciting, there is the possibility of large numbers of bad actors running rogue nodes or tampering with communications for their own gain. This is what I mean by "unreliable infrastructure".

An explicit design goal for Ethereum is that it doesn't only run well when every node is running well and communicating well. We have done our best to design a system that will do its best to continue running even when the world beneath it is falling apart.

Coming to consensus

The Ethereum network comprises a large number of individual nodes. Each node acts independently, and nodes communicate over an unreliable, asynchronous network, the Internet. Any individual node might be honest – behaving correctly at all times – or faulty in any arbitrary way: simply down or non-communicative, following a different version of the protocol, actively trying to mislead other nodes, publishing contradictory messages, or any manner of other fault.

Users submit transactions to this network of nodes, and the goal of the consensus protocol is that all correct nodes eventually agree on a single, consistent view of the history of transactions. That is, the order in which transactions were processed and the outcome of that processing. So, if I have 1 ETH and I simultaneously tell the network that I am sending that 1 ETH to Alice and also to Bob, we expect that eventually the network will agree that either I sent it to Alice or I sent it to Bob. It would be a failure if both Alice and Bob received my Ether, or if neither received it.

A consensus protocol is the process by which this agreement on the ordering of transactions comes about.

Ethereum's consensus protocol actually "bolts together" two different consensus protocols. One is called LMD GHOST, the other Casper FFG. The combination has become known as Gasper. In subsequent sections we will be looking at these both separately and in combination.

Byzantine generals

In a 1982 paper Leslie Lamport described in rather whimsical terms the fundamental problem that consensus systems are trying to solve - building reliable distributed systems.

We imagine that several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. The generals can communicate with one another only by messenger. After observing the enemy they must decide on a common plan of action.

This formulation makes clear that there is no overall holistic view, no God-mode in which we can see the whole situation in one glance and make a decision. We are simply one of the generals, and our only source of information about the other generals is the messages that we receive - messages that may be correct, or lies, or mistakes based on limited information, or delayed, or modified in transit. We have only a very limited local view, yet we must come to a view about the state of the whole system.

It is important to keep this in mind at all times. When we draw diagrams of block chains and block trees, it is easy to assume that this is somehow "the state" of the whole system. But these diagrams only ever represent the local view of a single participant in the system. My node's view of the system is likely to differ from your node's view of the system, if only temporarily, because we operate over an unreliable network. For example, you will see blocks at different times from when I see them, or in a different order, or even different blocks from those that I see.

Lamport captures the faultiness of the system in the following way.

However, some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.

These treacherous generals exhibit what we've come to call "Byzantine behaviour", or "Byzantine faults". They can act in any arbitrary way: delaying messages, reordering messages, outright lying, sending contradictory messages to different recipients, failing to respond at all, or any other behaviour we can think of.

A picture of a node with messages coming in.

I receive a ton of messages from other nodes, but I have no idea which are accurate, what order they were sent in, or if any are missing or just delayed. Somehow, we need to reach agreement.

The loyal generals need a method that reliably delivers an outcome on the following terms.

A. All loyal generals decide upon the same plan of action [e.g. "attack" or "retreat"], and

B. A small number of traitors cannot cause the loyal generals to adopt a bad plan.

Achieving consensus in such a Byzantine distributed system is not an easy problem to solve, but there have been several reasonably successful approaches over the years.

The first mainstream solution was the Practical Byzantine Fault Tolerance (PBFT) algorithm published by Liskov and Castro in 1999. This relies on a relatively small and limited set of known consensus participants (called replicas). PBFT is always "safe", in the terms discussed below and does not have forks.

Nakamoto consensus, invented by Satoshi Nakamoto for Bitcoin in 2008, takes a fundamentally different approach. Rather than limiting participants to a known set it uses proof of work to permissionlessly select a temporary leader for the consensus. Unlike PBFT, Nakamoto consensus allows forks and is not formally "safe".

Many, many variants of these and other novel alternatives, such as the Avalanche family of protocols, have since sprung up. Section 7, Related Work, of the Avalanche white paper provides a good survey of the zoo of different consensus protocols currently in use in the blockchain world.

Proof of Stake and Proof of Work

This is a good point at which to mention that neither proof of work nor proof of stake is a consensus protocol in itself. They are often (lazily) referred to as consensus protocols, but each is merely an enabler for consensus protocols.

For the most part, both proof of work and proof of stake are Sybil resistance mechanisms that place a cost on participating in the protocol. This prevents attackers from overwhelming the protocol at low or zero cost.1

Nevertheless, both proof of work and proof of stake are often fairly tightly coupled, via fork choice rules, to the consensus mechanisms that they support. They provide a useful way to assign a weight, or a score, to a chain of blocks: in proof of work, the total work done; in proof of stake, the amount of value that supports a particular chain.

Beyond these basic factors, both proof of work and proof of stake enable many kinds of different consensus protocols to be built on them, each with its own dynamics and trade-offs. Once again, the survey in section 7, Related Work, of the Avalanche white paper is instructive.

Block chains

The basic primitive that underlies blockchain technology is, of course, the block.

A block comprises a set of transactions that a leader (the block proposer) has assembled. A block's contents (its payload) may vary according to the protocol.

  • The payload of a block on Ethereum's execution chain is a list of user transactions.
  • The payload of a block on the pre-Merge proof of stake beacon chain was (mostly) a set of attestations made by other validators.
  • Post-Merge beacon chain blocks also contain the execution payload (the user transactions).
  • As and when EIP-4844 is implemented on Ethereum, blocks will contain commitments to opaque blobs of data alongside the ordered list of user transactions.

Except for the special Genesis block, every block builds on and points to a parent block. Thus, we end up with a chain of blocks: a blockchain. Whatever the contents of blocks, the goal of the protocol is for all nodes on the network to agree on the same history of the blockchain.

A picture of a linear chain of blocks.

A blockchain. Time moves from left to right and, except for the Genesis block, each block points to the parent block it builds on.

The chain grows as nodes add their blocks to its tip. This is accomplished by temporarily selecting a "leader", an individual node that has the right to extend the chain. In proof of work the leader is the miner that first solves the proof of work puzzle for its block. In Ethereum's proof of stake the leader is selected pseudo-randomly from the pool of active stakers.

The leader (usually known as the block proposer) adds a single block to the chain, and has full responsibility for selecting and ordering the contents of that block, though its block must be valid according to the protocol rules otherwise the rest of the network will simply ignore it.

The use of blocks is an optimisation. In principle we could add individual transactions to the chain one by one, but that would add a huge consensus overhead. So blocks are batches of transactions, and sometimes people argue about how big those blocks should be. In Bitcoin, the block size is limited by the number of bytes of data in the block. In Ethereum's execution chain, the block size is limited by the block gas limit (that is, the amount of work needed to run the transactions in the block). Beacon block sizes are limited by hard-coded constants.

Block trees

Our neat diagram of a nice linear chain will for the most part reflect what we see in practice, but not always. Sometimes, due perhaps to network delays, or a dishonest block proposer, or client bugs, any particular node might see something more like the following.

A diagram of a block tree.

In general, we might end up with a block tree rather than a block chain. Again, time moves from left to right and each block points to the parent block it builds on.

In real networks we can end up with something more like a block tree than a block chain. In this example very few blocks are built on their "obvious" parent.

Why did the proposer of block CC build on AA rather than BB?

  • It may be that the proposer of CC had not received block BB by the time it was ready to make its proposal.
  • It may be that the proposer of CC deliberately wanted to exclude block BB from its chain, for example to steal its transactions, or to censor some transaction in BB.
  • It may be that the proposer of CC thought that block BB was invalid for some reason.

The first two reasons, at least, are indistinguishable to the wider network. All we know is that CC built on AA, and we can never know why for certain.

Similarly, why did the proposer of block DD build on BB rather than CC? Any of the above reasons apply, and we can add another:

  • The proposer of DD may have decided on some basis that there was more chance of the wider network eventually including BB than CC. Thus, building DD on BB gives it more chance of making it into the eventual block chain, than building DD on CC.

The various branches in the block tree are called "forks". Forks happen naturally as a consequence of network and processing delays. But they can also occur due to client faults, malicious client behaviour, or protocol upgrades that change the rules, making old blocks invalid with respect to the new rules. The last of these is often called a "hard fork".

The existence of forking in a consensus protocol is a consequence of prioritising liveness over safety, in the terms discussed below: if you were to consult nodes that are following different forks they would give you different answers regarding the state of the system. Non-forking consensus protocols exist, such as PBFT in the classical consensus world and Tendermint in the blockchain world. These protocols always produce a single linear chain and are thus formally "safe". However, they sacrifice liveness on asynchronous networks such as the Internet: rather than forking, they just stop entirely.

Fork choice rules

As we've seen, for all sorts of reasons – network delays, network outages, messages received out of order, malicious behaviour by peers – nodes across the network end up with different views of the network's state. Eventually, we want every correct node on the network to agree on an identical linear view of history and hence a common view of the state of the system. It is the role of the protocol's fork choice rule to bring about this agreement.

Given a block tree and some decision criteria based on a node's local view of the network, the fork choice rule is designed to select, from all the available branches, the one that is most likely to eventually end up in the final linear, canonical chain. That is, it will choose the branch least likely to be later pruned out of the block tree as nodes attempt to converge on a canonical view.

A diagram of a block chain as a subset of the block tree.

The fork choice rule selects a head block from among the candidates. The head block identifies a unique linear block chain running back to the Genesis block.

The fork choice rule selects a branch implicitly by choosing a block at the tip of a branch, called the head block.

For any correct node, the first criterion for any fork choice rule is that the block it chooses must be valid according to the protocol's rules, and all its ancestors must be valid. Any invalid block is ignored, and any blocks built on an invalid block are themselves invalid.

Given that, there are many examples of different fork choice rules.

  • The proof of work protocols in Ethereum and Bitcoin use a "heaviest chain rule"2 (sometimes called "longest chain", though that's not strictly accurate). The head block is the tip of the chain that represents the most cumulative "work" done under proof of work.
  • The fork choice rule in Ethereum's proof of stake Casper FFG protocol is "follow the chain containing the justified checkpoint of the greatest height", and to never revert a finalised block.
  • The fork choice rule in Ethereum's proof of stake LMD GHOST protocol is specified in its name: take the "Greediest Heaviest Observed SubTree". It involves counting accumulated votes from validators for blocks and their descendent blocks. It also applies the same rule as Casper FFG.

We will properly unpack the second and third of these later in their respective sections.

You can perhaps see that each of these fork choice rules is a way to assign a numeric score to a block. The winning block, the head block, has the highest score. The idea is that all correct nodes, when they eventually see a certain block, will unambiguously agree that it is the head and choose to follow its branch whatever else is going on in their own views of the network. Thus, all correct nodes will eventually agree on a common view of a single canonical chain going back to genesis.

Reorgs and reversions

As a node receives new blocks (and, under proof of stake, new votes for blocks) it will re-evaluate the fork choice rule in the light of the new information. Most commonly, a new block will be a child of the block that the node currently views as the head block, and the new block will become the head block.

However, sometimes the new block might be a descendent of some other block in the block tree. (Note that, if the node doesn't already have the parent block of the new block, it will need to ask its peers for it, and so on for any blocks it knows that it is missing.)

In any case, running the fork choice rule on the updated block tree might indicate a head block that is on a different branch from the previous head block. When this happens, the node must perform a reorg (short for reorganisation), also known as a reversion. It will kick out (revert) blocks that it had previously included in its chain, and will adopt the blocks on the new head's branch.

In the following diagram, the node has evaluated block FF to be the head block, hence its chain comprises blocks AA, BB, DD, EE, and FF. The node knows about block CC, but it does not appear in its view of the chain; it is on a side branch.

A diagram of a blockchain prior to a reversion.

At this point, the node believes that block FF is the best head, and therefore its chain is blocks [ABDEF][A \leftarrow B \leftarrow D \leftarrow E \leftarrow F].

Some time later the node receives block GG which is not built on its current head block FF, but on block CC on a different branch. Depending on the details of the fork choice rule, the node might still evaluate FF to be a better head than GG and therefore ignore GG. But in this case we will assume that the fork choice rule indicates that GG is the better head block.

Blocks DD, EE, and FF are not ancestors of GG, so they need to be removed from the node's canonical chain. Any transactions or information those blocks contain will be reverted, as if they were never received. The node must perform a full rewind to the state that it was in after processing block BB.

After rewinding to BB, the node can add blocks CC and GG to its chain and process them accordingly. Once done, the node will have completed the reorganisation of its chain.

A diagram of a blockchain after a reversion.

Now the node believes that block GG is the best head, and therefore its chain must change to the blocks [ABCG][A \leftarrow B \leftarrow C \leftarrow G].

Later, perhaps, a block HH might appear that builds on FF. If the fork choice rule indicates that HH ought to be the new head, then the node will perform a reorg once again, reverting blocks back to BB and replaying the blocks on HH's branch.

Short reorgs of one or two blocks in both proof of work and Ethereum's proof of stake protocol are not uncommon due to network delays in block propagation. Much longer reorgs ought to be exceedingly rare, unless the chain is under attack, or there is a bug in the formulation of – or the clients' implementations of – the fork choice rule.

Safety and Liveness

Two important concepts that crop up frequently when discussing consensus mechanisms are safety and liveness.

Safety

Informally, an algorithm is said to be safe if "nothing bad ever happens".3

Examples of bad things that might happen in the blockchain context could be the double-spend of a coin, or the finalising of two conflicting checkpoints.

An important facet of safety in a distributed system is "consistency". That is, if we were to ask different (honest) nodes about the state of the chain at some point in its progress, such as the balance of an account at a particular block height, then we should always get the same answer, no matter which node we ask. In a safe system, every node has an identical view of the history of the chain that never changes.

Effectively, safety means that our distributed system "behaves like a centralized implementation that executes operations atomically one at a time." (to quote Castro and Liskov). A safe system is, in Vitalik's taxonomy of centralisation, logically centralised.

Liveness

Again informally, an algorithm is said to be live if "something good eventually happens".

In a blockchain context we generally understand this to mean that the chain can always add a new block; it will never get into a deadlock situation in which it will not produce a new block with transactions in it.

"Availability" is another way of looking at this. I want the chain to be available, meaning that if I send a valid transaction to an honest node it will eventually be included in a block that extends the chain.

You can't have both!

The CAP theorem is a famous result in distributed systems' theory that states that no distributed system can provide all three of (1) consistency, (2) availability, and (3) partition tolerance. Partition tolerance is the ability to function when communication between nodes is not reliable. For example, a network fault might split the nodes into two or more groups that can't communicate with each other.

It is easy to demonstrate the CAP theorem in our blockchain context. Imagine that Amazon Web Services goes offline, such that all the AWS hosted nodes can communicate with each other, but none can talk to the outside world. Or that a country firewalls all connections in and out so that no gossip traffic can pass. Either of these scenarios divide the nodes into two disjoint groups, AA and BB.

A diagram of a network partition.

The network is partitioned: the nodes in AA can talk among themselves, but cannot talk to any node in BB, and vice versa.

Let's say that somebody connected to the network of group AA sends a transaction. If the nodes in AA process that transaction then they will end up with a state that is different from the nodes in group BB, which didn't see the transaction. So, overall, we have lost consistency between all the nodes, and therefore safety. The only way to avoid this is for the nodes in group AA to refuse to process the transaction, in which case we have lost availability, and therefore liveness.

In summary, the CAP theorem means that we cannot hope to design a consensus protocol that is both safe and live under all circumstances, since we have no option but to operate across an unreliable network, the Internet.4

Ethereum prioritises liveness

The Ethereum consensus protocol offers both safety and liveness in good network conditions, but prioritises liveness when things are not running so smoothly. In the case of a network partition the nodes on each side of the partition will continue to produce blocks. However, finality (a safety property) will no longer occur on both sides of the partition. Depending on the proportion of stake managed by each side, either one side or neither side will continue to finalise.

Eventually, unless the partition is resolved, both sides will regain finality due to the novel inactivity leak mechanism. But this results in the ultimate safety failure. Each chain will finalise a different history and the two chains will become irreconcilable and independent forever.

Finality

We're going to be discussing finality a good deal over the following sections, which is a safety property of the chain.

Finality is the idea that there are blocks that will never be reverted. When a block has been finalised, all the honest nodes on the network have agreed that the block will forever remain part of the chain's history, and therefore that all of its ancestors will remain in the chain's history. Finality makes your payment for pizza as irrevocable as if you'd handed over cash. It is the ultimate protection against double-spending.5

Some consensus protocols, like classical PBFT, or Tendermint, finalise every round (every block). As soon as a round's worth of transactions has been included on the chain, all the nodes agree that it will be there forever. On the one hand, these protocols are very "safe": once a transaction has been included on-chain, it will never be reverted. On the other hand, they are vulnerable to liveness failures: if the nodes cannot come to agreement – for example, if more than one third of them are down or unavailable – then no transactions can be added to the chain and it will stop dead.

Other consensus protocols, such as Bitcoin's Nakamoto consensus, do not have any finality mechanism at all. There is always the possibility that someone will reveal an alternative, heavier chain. When this happens, all honest nodes must reorg their chains accordingly, reverting whatever transactions they previously processed. Heuristics such as how many confirmations your block has are only approximations to finality, they are not guarantees.6

Ethereum's consensus layer prioritises liveness, but also strives to offer a safety guarantee in the form of finality when circumstances are favourable. This is an attempt to gain the best of both worlds. Vitalik has defended this as follows.7

The general principle is that you want to give users "as much consensus as possible": if there’s >2/3>2/3 then we get regular consensus, but if there's <2/3<2/3 then there’s no excuse to just stall and offer nothing, when clearly it’s still possible for the chain to keep growing albeit at a temporarily lower level of security for the new blocks. If an individual application is unhappy with that lower level of security, it’s free to ignore those blocks until they get finalized.

Finality in Ethereum's consensus layer is delivered by the Casper FFG mechanism that we'll be exploring soon. The idea is that, periodically, all honest validators agree on fairly recent checkpoint blocks that they will never revert. That block and all its ancestor blocks are then "final" - they will never change, and if you consult any honest node in the network about them or their ancestors you will always get the same answer.

A diagram showing a finalised portion of chain and a forkful portion.

The honest nodes have agreed that the checkpoint and all its ancestor blocks are "final" and will never be reverted. There are therefore no forks before the checkpoint. The chain descending from the checkpoint remains liable to forking.

Ethereum's finality is "economic finality". It is theoretically possible for the protocol to finalise two conflicting checkpoints, that is, two contradictory views of the chain's history. However, it is possible only at enormous and quantifiable cost. For all but the most extreme attack or failure scenarios, final means final.

The section on Casper FFG dives into the detail of how this finality mechanism works.

See also

It's always worth reading anything that Leslie Lamport has had a hand in, and the original 1982 paper by Lamport, Shostak, and Pease on The Byzantine Generals Problem contains many insights. While the algorithm they propose is hopelessly inefficient in modern terms, the paper is a good introduction to reasoning about consensus protocols in general. The same is true of Castro and Liskov's seminal 1999 paper Practical Byzantine Fault Tolerance which significantly influenced the design of Ethereum's Casper FFG protocol. However, you might like to contrast these "classical" approaches with the elegant simplicity of proof of work, as described by Satoshi Nakamoto in the 2008 Bitcoin white paper. If proof of work has just one thing in its favour, it is its simplicity.

We've referred above to Gilbert and Lynch's 2012 paper, Perspectives on the CAP Theorem. It is a very readable exploration of the concepts of consistency and availability (or safety and liveness in our context).

The Ethereum beacon chain underwent a seven block reorg in May 2022 due to differences between client implementations of the fork choice rule. These differences were known at the time and thought to be harmless. That proved to be not so. Barnabé Monnot's write-up of the incident is very instructive.

Vitalik's blog post On Settlement Finality provides a deeper and more nuanced exploration of the concept of finality.

Our ideal for the systems we are building is that they are politically decentralised (for permissionlessness and censorship resistance), architecturally decentralised (for resilience, with no single point of failure), but logically centralised (so that they give consistent results). These criteria strongly influence how we design our consensus protocols. Vitalik explores these issues in his article, The Meaning of Decentralization.


  1. In proof of work, the "proof" you bring is a number that makes the block hash a certain value. This proves that you did the work to calculate it. In proof of stake, your proof is a private key that is associated with a deposit of stake on the blockchain. Other proof mechanisms are available, such as proof of space and time.
  2. Contrary to popular belief, Ethereum's proof of work protocol did not use any form of GHOST in its fork choice. This misconception is very persistent, probably due to the Ethereum Whitepaper. I eventually asked Vitalik about it, and he confirmed to me that although GHOST had been planned under PoW it was never implemented due to concerns about some unspecified attacks. The heaviest chain rule was simpler and well tested. It served us well.
  3. The helpful, intuitive definitions of safety and liveness I've quoted appear in short form in Lamport's 1977 paper, Proving the Correctness of Multiprocess Programs, and as stated here in Gilbert and Lynch's 2012 paper, Perspectives on the CAP Theorem.
  4. The CAP theorem is related to another famous result described by Fisher, Lynch and Paterson in their 1985 paper, Impossibility of Distributed Consensus with One Faulty Process, usually called the FLP theorem. This proves that, even in a reliable asynchronous network (that is, with no bound on how long messages can take to be received), just one faulty node can prevent the system from coming to consensus. That is, even this unpartitioned system cannot be both live and safe. Gilbert and Lynch's paper discusses the FLP theorem in section 3.2.
  5. It's worth noting that finality is never absolute. Whatever any protocol claims, if a supermajority of nodes agrees (for example via a software upgrade) to revert a bunch of finalised blocks, then that's going to happen. Ultimately, as in all things, the concept of finality is subservient to social consensus. See On Settlement Finality for further discussion.
  6. At the time of writing, at least one exchange requires 40000 confirmations for deposits from the Ethereum Classic network. That means that forty thousand blocks must be built on top of a block containing the deposit transaction before the exchange will process it, which takes about six days. The requirement reflects concern about the vulnerability of ETC's low hash rate proof of work chain to 51% attacks - it is relatively easy for an attacker to revert blocks at will. The reality is that, in the face of a well-crafted 51% attack, no number of confirmations is truly safe.
  7. The value of this was evident when the beacon chain stopped finalising for around an hour on the 12th of May, 2023. Participation in consensus dropped from over 99% of validators to around 40% for the duration. Ordinary Ethereum users and applications, however, would have barely noticed. Blocks continued to be produced (albeit fewer than normal) and transactions continued to be executed.

Created by Ben Edgington. Licensed under CC BY-SA 4.0. Published 2023-09-29 14:16 UTC. Commit ebfcf50.