Part 2: Technical Overview

Consensus

Casper FFG

  • Casper FFG adds finality to the Eth2 consensus protocol.
  • It acts as an overlay on LMD GHOST consensus that modifies its fork choice rule.
  • Casper FFG is classically safe under asynchrony when less than 13\frac{1}{3} of validators are faulty or adversarial.
  • Moreover, slashing allows Casper FFG to provide accountable safety, also known as economic finality, when more than 13\frac{1}{3} are adversarial.

Introduction

Many articles have been written to explain the basic mechanics of Casper FFG – how it works – but there is very little about why it works. I hope that by the end of this section you will feel that you have some insight into why Casper FFG is effective.

The mechanics of Casper FFG are not very complicated. Even so, as you read on, bear in mind that the effectiveness of Casper FFG really comes down to two big ideas. First, the two-phase commit (justification and finalisation) and, second, accountable safety.

The two-phase commit gives Casper FFG classical consensus safety. It makes it possible to declare blocks final, being certain that no honest validator will ever revert them. But that's enforceable only when over two-thirds of the stake is controlled by honest validators, something we cannot always assume.

On top of this, Casper FFG offers an extra guarantee called accountable safety for situations when over one-third of the validators are dishonest. If the chain ever suffers from conflicting finality, at least one-third of the total amount of staked Ether will be burnt. This is enforced by slashing validators that break either of the two Casper commandments.

Overview

Casper FFG is a kind of meta-consensus protocol. It is an overlay that can be run on top of an underlying consensus protocol in order to add finality to it. Recall that finality is the property that there are blocks in the chain that are guaranteed never to be reverted: they will be part of the chain forever. In Ethereum's proof of stake consensus, the underlying protocol is LMD GHOST which does not provide finality - there is always a chance that validators might decide to build a competing chain, and there is no real penalty for doing so. Casper FFG functions as a "finality gadget", and we use it to add finality to LMD GHOST.

Casper FFG takes advantage of the fact that, as a proof of stake protocol, we know who our participants are: the validators that manage the staked Ether. This means that we can use vote counting to judge when we have seen a majority of the votes of honest validators. More precisely, votes from validators managing a majority of the stake - in everything that follows, every validator's vote is weighted by the value of the stake that it manages, but for simplicity we won't spell it out every time.

We operate on an asynchronous network, the Internet, which means that we can tolerate at most 13\frac{1}{3} of the validators being adversarial (or faulty) if we want to achieve both safety and liveness. This is a well-known result in consensus theory1, and the reasoning goes as follows.

  • We have a total of nn validators, of which a number, ff, may be faulty or adversarial in some way.
  • To preserve liveness, we need to be able to make a decision after hearing from only nfn - f validators, since the ff faulty ones might withhold their votes.
  • But this is an asynchronous environment, so the ff non-responders may simply be delayed, and not faulty at all.
  • Therefore we must account for up to ff of the nfn - f responses we have received being from faulty or adversarial validators.
  • To guarantee that we can always achieve a simple majority of honest validators after hearing from nfn - f validators, we require that (nf)/2>f(n - f)/2 > f. That is, n>3fn > 3f.

To summarise, like all classical Byzantine fault tolerant (BFT) protocols, Casper FFG is able to provide finality when less than one-third of the total validator set is faulty or adversarial. When sufficient honest validators have declared a block finalised, all honest validators will follow, and that block will not be reverted. However, as we shall see, unlike classical BFT protocols, Casper FFG is also able to provide economic finality (accountable safety) even when more than one-third of the validator set is faulty or adversarial.

In this section, we will consider Casper FFG on its own terms, without spending much time on how it is integrated with LMD GHOST. This is in the spirit of the Casper FFG paper which has very little to say about the underlying blockchain consensus mechanism. We will look at how the two are combined into Gasper in the next section.

Naming

Once again, the name Casper FFG comprises two parts, and it's worth looking at them both.

Casper

The Casper part of the name seems to be due to Vlad Zamfir. He explains in Part 5 of his History of Casper as follows.

In this chapter I recount the story of Casper’s birth as an application of the principles of Aviv Zohar and Jonatan Sompolinsky’s GHOST to proof-of-stake.

I called it "the friendly ghost" because of incentives designed to guarantee censorship resistance against the oligopolists: incentives that force the cartel to be friendly to non-cartel validators.

The GHOST protocol he mentions is the same as we looked at in the previous section. It helps to make sense of all this if you share the cultural context that Casper the Friendly Ghost is a cartoon character that's been around since the 1940s.

Zamfir's protocol was initially called Casper TFG (The Friendly Ghost), and later renamed Casper CBC (Correct By Construction). Vitalik's Casper FFG grew up alongside Zamfir's Casper TFG/CBC – which presumably explains the naming synergy – but has very little in common with it. Indeed, Casper FFG doesn't even use the GHOST protocol.2

FFG

The FFG part stands for the "Friendly Finality Gadget", as in the title of the Casper FFG paper.

This is clearly a play on Zamfir's TFG name, but also indicates that Casper FFG is not a fully self-contained blockchain protocol, but a "gadget" than can add finality to an underlying consensus protocol.

Terminology

As ever, the jargon that we use is the way into understanding how the protocol is constructed.

Epochs and checkpoints

In order to come to a decision about finality, the Casper FFG mechanism needs to process votes from at least 23\frac{2}{3} of the validator set. In Ethereum, the validator set is potentially very large, and it is infeasible for votes from several hundred thousand validators to be broadcast, gossiped, and processed simultaneously.

To work around this, voting is spread out through the duration of an epoch3, which, in Eth2, is 32 slots of 12 seconds each. At each slot, 132\frac{1}{32} of the total validator set is scheduled to broadcast a vote, so each validator is scheduled to cast a vote exactly once per epoch. For efficiency, we bundle each validator's Casper FFG vote with its LMD GHOST vote, although that's by no means necessary.

To ensure that validators voting at different times during the epoch have something in common to vote for, we make them vote for a checkpoint, which is the first slot of an epoch. The checkpoint in epoch NN is at slot number 32N32N (remembering that slots and epochs are numbered from zero).

Diagram showing an epoch with a checkpoint at the first slot and blocks within each slot.

An epoch is divided into 32 slots, each of which usually contains a block. The first slot of an epoch is its checkpoint. Time increases from left to right.

As an aside, people often talk lazily of finalising epochs, but that's not correct. Casper FFG finalises checkpoints, the first slots of epochs. When we have finalised the checkpoint in epoch NN, we have finalised everything up to and including slot 32N32N. This includes all of epoch N1N-1 and the first slot of epoch NN. But we have not yet finalised epoch NN - it still has 31 unfinalised slots in it.

For the time-being we will assume that every slot has a block in it. This is because the original Casper FFG's checkpoints are based on block heights rather than slot numbers. We will relax this assumption to allow empty slots and checkpoints when we look at Gasper.

Within the protocol, a Checkpoint object simply contains the epoch number of the checkpoint, and the hash tree root of the head block in the epoch's first slot (root):

class Checkpoint(Container):
    epoch: Epoch
    root: Root
Justification and finalisation

Like classical BFT consensus protocols, Casper FFG achieves finality through a two-round process.

In the first round, I broadcast my view of the current epoch's checkpoint (XX, say) to rest of the network, and I hear from the rest of the network what their view is. If a supermajority tells me that they also support XX, that allows me to justify it. Justification is local to my network view: at this stage, I believe that the majority of the network believes that XX is favourable for finalisation. But I don't yet know that the rest of the network has come to the same conclusion. Under adversarial conditions, it is possible that a sufficient majority of the other validators failed to come to a decision on XX. We shall look at such a scenario later.

In the second round, I broadcast the fact that I've heard from a supermajority of validators that they support XX (that is, that I've justified XX) and I hear from the rest of the network whether they believe that a supermajority of validators supports XX (that is, that they have justified XX). If I hear that a supermajority of validators agrees with me that XX is justified, then I will finalise XX. Finalisation is a global property: once a checkpoint is finalised, I know that no honest validator will ever revert it. Even if they haven't yet marked the checkpoint as finalised in their view, I know that they've at least marked it justified, and that there is no (non-slashable) behaviour that will be able to revert that justification.

To summarise, for me to be absolutely certain that the whole network agrees that a block will not be reverted requires the following steps.4

  1. Round 1 (ideally leading to justification):
    1. I tell the network what I think is the best checkpoint.
    2. I hear from the network what all the other validators think is the best checkpoint.
    3. If I hear that 23\frac{2}{3} of the validators agree with me, I justify the checkpoint.
  2. Round 2 (ideally leading to finalisation):
    1. I tell the network my justified checkpoint, the collective view I gained from round 1.
    2. I hear from the network what all the other validators think the collective view is, their justified checkpoints.
    3. If I hear that 23\frac{2}{3} of the validators agree with me, I finalise the checkpoint.

In short, when I justify a checkpoint I make a commitment never to revert it. When I finalise a checkpoint I know that all honest validators are committed to never reverting it.

A diagram illustrating voting in the two rounds.

In Round 1, I broadcast my best checkpoint and hear from all the others their best checkpoint. Ideally this leads to justification. In Round 2, I broadcast what I heard everyone's best checkpoint to be and hear their views. Ideally this leads to finalisation.

Under ideal conditions, each round lasts an epoch, so it takes an epoch to justify a checkpoint and a further epoch to finalise a checkpoint. At the start of epoch NN we are aiming to have justified checkpoint N1N-1 and to have finalised checkpoint N2N-2.

Quantifying that, it takes 12.8 minutes, two epochs, to finalise a checkpoint in-protocol. In Casper FFG, the two rounds are overlapped and pipelined, so that, although it takes 12.8 minutes from end to end to finalise a checkpoint, we can finalise a checkpoint every 6.4 minutes, once per epoch.

Note that it can be possible from outside the protocol to see that a checkpoint is likely to be finalised a little earlier than the full 12.8 minutes, assuming that there is no long chain reorg. Specifically, it is possible to have collected enough votes by 23\frac{2}{3} of the way through the second round, that is after about 11 minutes. However, in-protocol justification and finalisation is done only during end of epoch processing.

An aside on the nomenclature: the terms "finalised" and "justified" do not appear in the classical consensus literature. It's easy to see where "finalised" comes from, but perhaps not so for "justified", which is frankly a peculiar term to be using here. As far as I can tell, its origins are in Vlad Zamfir's Casper TFG protocol. In that work, messages contain a "justification" to support the vote being made. The paper doesn't use the word "justified" anywhere, but I suspect that's where we got it from. In Casper FFG, my "justfication" is having seen evidence that 23\frac{2}{3} of validators like the same checkpoint as I do.

A vote in Casper FFG has two parts: a source checkpoint vote and a target checkpoint vote. These are the source and target fields in an attestation's data:

class AttestationData(Container):
    slot: Slot
    index: CommitteeIndex
    # LMD GHOST vote
    beacon_block_root: Root
    # FFG vote
    source: Checkpoint
    target: Checkpoint

Source and target votes are made simultaneously in the form of a link st{s \rightarrow t}, where ss is the source checkpoint and tt is the target checkpoint.

The role of my target vote is to broadcast my view of what the I think should be the next checkpoint to be justified. In the terms above, it is my round 1 vote. My target vote is a soft (conditional) commitment not to revert that checkpoint as long as I hear from 23\frac{2}{3} of validators that they also commit to that checkpoint.

The role of my source vote is to broadcast that I've seen support from 23\frac{2}{3} of the network for checkpoint ss, and that it is the most recent such checkpoint that I know about. In the terms above, it is my round 2 vote announcing the collective view that I've heard. By making this source vote I am upgrading my previous soft commitment not to revert the checkpoint to a hard (unconditional) commitment never to revert it.

A diagram illustrating how the votes from the two rounds are combined into one attestation.

Casper FFG combines the source and target votes into a single message: a vote for a link st{s \rightarrow t}.

An honest validator's source vote will always be the highest justified checkpoint in its view of the chain. Its target vote will be the checkpoint of the current epoch that descends from the source checkpoint. The source and the target checkpoints do not need to be consecutive; it is permissible to jump checkpoints. But when the beacon chain is running smoothly, the target vote of one epoch will be the source vote of the next epoch.

A diagram showing a valid link from source to target.

A link from a justified checkpoint to a checkpoint on a chain that descends from it is valid. Only checkpoints are shown here; the intermediate blocks have been omitted for clarity.

In a valid link, the source checkpoint will always be an ancestor of the target checkpoint. If it weren't so, I'd be contradicting myself: the source vote announces my commitment never to revert checkpoint ss; if the target checkpoint tt is not descended from ss then that would be a vote to revert ss. However, it is not a slashable offence to publish such an invalid link5.

A diagram showing an invalid link from source to a conflicting target.

A link from a justified checkpoint to a checkpoint on a chain that does not descend from it is invalid. The two checkpoints are said to be conflicting as neither descends from the other.

There are some specific criteria around timeliness that valid Casper FFG votes must satisfy in the Eth2 implementation. We'll discuss these more in the Gasper section as they don't apply to the abstract Casper FFG protocol that we're considering here.

When weighing Casper FFG votes, only votes that have been received in blocks are considered. Unlike with LMD GHOST's fork choice, we do not consider any Casper FFG votes received solely via gossip, by carrier pigeon, or by any other means. This is because we must always have a common record of our decision making around finality, and the block history provides that common record. So, when I said above, "I tell the network", it's shorthand for saying that I broadcast an attestation that will be picked up by a block proposer and included in a block. And when I said, "I hear from the network", it's shorthand for saying that I processed the attestations included in a block.

Also, allow me to reiterate that votes in Casper FFG are weighted according to validators' effective balances. A vote from a validator with a 24 ETH effective balance carries 75% of the weight of a vote from a validator with a 32 ETH effective balance. I will frequently say things like, "the votes of two-thirds of the validators"; you should understand this as, "the votes of validators managing two-thirds of the stake".

As described above, a link is a Casper FFG vote pair that links the source and target checkpoints, st{s \rightarrow t}.

A link st{s \rightarrow t} is a supermajority link when over 23\frac{2}{3} of the validators (weighted by stake) have published the same link (and had their votes included in blocks in a timely way).

The mechanics of Casper FFG

With (most of) the terminology and key concepts behind us, we can now look at how Casper FFG operates in detail. It is fairly straightforward.

Justification

When a node sees a supermajority link from justified checkpoint c1c_1 to checkpoint c2c_2, then it considers checkpoint c2c_2 to be justified.

A diagram showing that a supermajority link justifies a checkpoint.

My node has seen a supermajority link CNCN+2{C_N \rightarrow C_{N+2}}, therefore I mark CN+2C_{N+2} as justified. Justified checkpoints are hatched and marked with "J".

Justification means that I have seen commitments from over 23\frac{2}{3} of the validator set that they will not revert checkpoint c2c_2 as long as they get confirmation from at least 23\frac{2}{3} of validators that also will not revert c2c_2.

Finalisation

When a node sees a supermajority link from justified checkpoint c1c_1 to checkpoint c2c_2, and checkpoint c2c_2 is a direct child checkpoint of c1c_1, then it considers checkpoint c1c_1 to be finalised.

In other words, a finalised checkpoint is a justified checkpoint whose direct child is justified.

A diagram showing that when the direct child of a justified checkpoint is justified, the parent checkpoint is finalised.

My node has seen a supermajority link CNCN+1{C_N \rightarrow C_{N+1}}, therefore I mark CN+1C_{N+1} as justified. Since CN+1C_{N+1} is a direct child of CNC_N in the checkpoint tree, I also make CNC_N as finalised. Finalised checkpoints are cross-hatched and marked with "F".

Finalisation means that I have seen confirmation from over 23\frac{2}{3} of the validators that they have seen commitments from over 23\frac{2}{3} of the validators that they will not revert checkpoint c1c_1. Checkpoint c1c_1 cannot now be reverted without at least 13\frac{1}{3} of validators provably changing their minds, and therefore getting slashed.

The above rule for finalising a checkpoint is the one described in the original Casper FFG paper. In practice we can use a slight generalisation of this rule, without affecting the safety proof. The generalisation is called k-finality and we will look at it a little later.

The Casper commandments

In the Casper FFG paper, each checkpoint has a defined height: if cc is a checkpoint, then h(c)h(c) is the height of that checkpoint. Checkpoint heights increase monotonically with distance from the genesis block.

In the Eth2 implementation of Casper FFG, the checkpoint height is the checkpoint's epoch number: h(c)=c.epochh(c) = \texttt{c.epoch}. Recall that a checkpoint comprises both a block hash and an epoch number. If either of these differs then the checkpoints are different.

Casper FFG's proof of accountable safety relies on any validator that violates either of the following two rules, or "commandments", being slashed.

No double vote

Commandment 1: a validator must not publish distinct votes s1t1{s_1 \rightarrow t_1} and s2t2{s_2 \rightarrow t_2} such that h(t1)=h(t2)h(t_1) = h(t_2).

Simply put, a validator must make at most one vote for any target epoch.

A diagram showing a violation of the no double vote rule by voting for the same target checkpoint from different source checkpoints.

One way to violate the no double vote rule: voting for the same target checkpoint from different source checkpoints: 03{0 \rightarrow 3} and 13{1 \rightarrow 3}.

A diagram showing a violation of the no double vote rule by voting for target checkpoints at the same height on different branches.

Another way to violate the no double vote rule: voting for different target checkpoints in the same epoch: 03{0 \rightarrow 3} and 03{0 \rightarrow 3'}.

No surround vote

Commandment 2: a validator must not publish distinct votes s1t1{s_1 \rightarrow t_1} and s2t2{s_2 \rightarrow t_2} such that h(s1)<h(s2)<h(t2)<h(t1)h(s_1) < h(s_2) < h(t_2) < h(t_1).

That is, a validator must not make a vote such that its link either surrounds, or is surrounded by, a previous link it voted for.

A diagram showing a violation of the no surround vote rule on a single branch.

One way to violate the no surround vote rule: the link 03{0 \rightarrow 3} surrounds the link 12{1 \rightarrow 2}.

A diagram showing a violation of the no surround vote rule on conflicting branches.

Another way to violate the no surround vote rule: again, the link 03{0 \rightarrow 3} surrounds the link 12{1 \rightarrow 2}, albeit on different branches.

Slashing

Any validator that violates either of the Casper commandments is liable to being slashed. This means that some or all of its stake is removed and it is ejected from the validator set.

Slashing underpins the accountable safety guarantee of Casper by putting a price on bad behaviour - specifically, behaviour that could lead to conflicting checkpoints being finalised. Slashing also features in LMD GHOST, and the Incentive Layer chapter has more information on the detailed mechanics of slashing.

Violations of the commandments are potentially difficult to detect in-protocol. In particular, detection of surround votes might require searching a substantial history of validators' previous votes6. For this reason, we rely on external slashing detection services7 to detect slashing condition violations and submit the evidence to block proposers. There needs to be only one such service on the network, as long as it is reliable. In practice we have more, but it is certainly not necessary for every node operator to run a slashing detector.

Once evidence of breaking one of the commandments has been found, it is easy to prove on chain that the validator broke the rules. Validators sign every attestation they publish, so, given two conflicting attestations, it is a simple matter to verify their signatures and show that the validator broke the rules in publishing them.

The protocol as it is presented in the Casper FFG paper assumes that, on proof of having violated a slashing condition, "the validator's entire deposit is taken away". We have implemented a variation of this for Eth2 that scales the proportion of a validator's stake that is forfeit in proportion to total amount of stake slashed in a given period. If 13\frac{1}{3} of all stake violated slashing conditions within a 36 day window, then the whole stake would be forfeit, as in the classic Casper FFG protocol. But if there were very few other slashings in the window, then almost no stake would be forfeit. This nicety does not change the Casper FFG guarantees in practice, at least since the Bellatrix upgrade to the beacon chain gave the PROPORTIONAL_SLASHING_MULTIPLIER constant its final value.

Fork choice rule

Casper FFG's fork choice rule takes the form of a modification to the underlying consensus mechanism's fork choice rule. From the Casper FFG paper, the underlying consensus mechanism must,

follow the chain containing the justified checkpoint of the greatest height.

The pure LMD GHOST protocol always begins its search for the head block from the root of the chain, the genesis block. When modified by Casper FFG's fork choice rule, LMD GHOST will start its search for the head block from the highest justified checkpoint it knows about, and will ignore potential head blocks that do not descend from the highest justified checkpoint. We will discuss this more when we get to Gasper.

This modification to the underlying consensus protocol's fork choice rule is what confers finality. When a node has justified a checkpoint in its local view, it is committed to never reverting it. Therefore, the underlying chain must always include that checkpoint; all branches that do not include that checkpoint must be ignored.

Note that this fork choice rule is compatible with the plausible liveness guarantee of Casper FFG.

The guarantees of Casper FFG

The Casper FFG consensus protocol makes two guarantees that are analogous to, but different from, the concepts safety and liveness in classical consensus: accountable safety, and plausible liveness.

Accountable safety

Classical PBFT consensus can guarantee safety only when less than one-third of validators are adversarial (faulty). If more than one-third are adversarial then it makes no promises at all.

Casper FFG comes with essentially the same safety guarantee when validators controlling less than one-third of the stake are adversarial: finalised checkpoints will never be reverted. In addition, it provides the further guarantee that, if conflicting checkpoints are ever finalised, validators representing at least one-third of the staked Ether will be slashed. This is called "accountable safety". It is accountable in that we can identify exactly which validators behaved badly and punish them directly.

The extra safety this guarantee provides is not safety in the usual consensus protocol sense of the word, but it is safety of a specifically cryptoeconomic nature: bad behaviour is heavily disincentivised by the protocol. It is often referred to as "economic finality".

Proof of accountable safety

The proof of Casper FFG's accountable safety is reasonably intuitive. I will sketch out the proof below more-or-less as it is presented in the Casper FFG paper, but modified for epochs rather than the more abstract "checkpoint height" that the paper uses.

We will prove that, if fewer than 13\frac{1}{3} of validators (weighted by stake) violate a Casper commandment, two conflicting checkpoints cannot both be finalised. We will do this by showing that the only way a conflicting checkpoint could be finalised is for there to be a supermajority link that is surrounded by another supermajority link, which contradicts the assumption at start of this paragraph, since surround votes violate the second commandment.

The first handy observation we'll need is that, under this assumption, at most one checkpoint can be justified in any epoch (in the views of honest nodes). This follows directly from the no double vote commandment.

Let us take two conflicting finalised checkpoints ama_m and bnb_n in epochs mm and nn respectively. Since the checkpoints conflict, neither is a descendent of the other. From the previous observation, we know that mnm \ne n. Without loss of generality, we will take m<nm < n, so that bnb_n is the higher finalised checkpoint.

Now, there must be a continuous series of justified checkpoints leading from the root checkpoint to bnb_n with supermajority links between them. That is, there is a set of kk supermajority links {rbi1,bi1bi2,bi2bi3,,bik1bik}\{{r \rightarrow b_{i_1}},\allowbreak {b_{i_1} \rightarrow b_{i_2}},\allowbreak {b_{i_2} \rightarrow b_{i_3}},\allowbreak \ldots,\allowbreak {b_{i_{k-1}} \rightarrow b_{i_k}}\}, with ik=ni_k = n. This follows from the definition of justification. The set of justified checkpoints leading to bnb_n is thus, B={r,bi1,bi2,bi3,,bik1,bik}\mathcal{B} = \{r, b_{i_1},\allowbreak b_{i_2}, b_{i_3},\allowbreak \ldots,\allowbreak b_{i_{k-1}}, b_{i_k}\}. We can imagine bouncing along supermajority links from the root, landing on each of these checkpoints before jumping to the next.

A diagram showing a chain of justified checkpoints joined by contiguous supermajority links from the root up to a finalised block.

For any finalised checkpoint, such as b10b_{10}, there is a contiguous chain of supermajority links leading to it from the root, rr. The chain of links here justifies the set of checkpoints B={r,b1,b4,b5,b9,b10}\mathcal{B} = \{r, b_1,\allowbreak b_4, b_5,\allowbreak b_9, b_{10}\}. Shaded checkpoints are justified (and maybe finalised); cross-hatched checkpoints are finalised (also marked with "F").

Now consider conflicting finalised checkpoint ama_m. From the definition of finalisation, there must be a supermajority link amam+1{a_m \rightarrow a_{m+1}} from ama_m to am+1a_{m+1} in the next epoch. Clearly neither ama_m nor am+1a_{m+1} is in the set B\mathcal{B}, since that would make ama_m an ancestor of bnb_n and not conflicting. Also, set B\mathcal{B} contains no checkpoint bmb_m or bm+1b_{m+1} since we may have only one justified checkpoint in an epoch.

Given these observations, it is inevitable that the pair of checkpoints (am,am+1)(a_m, a_{m+1}) falls between the epochs of two consecutive elements of B\mathcal{B}, say bij1b_{i_{j-1}} and bijb_{i_j}. That is, there is a jj such that ij1<m<m+1<iji_{j-1} < m < m+1 < i_j.

Finally, we can see that there must be a supermajority link bij1bij{b_{i_{j-1}} \rightarrow b_{i_j}} that surrounds the supermajority link amam+1{a_m \rightarrow a_{m+1}}. Either the surrounding or surrounded link cannot exist unless at least 13\frac{1}{3} of validators have violated the second Casper commandment, which we have assumed they did not.

Therefore, we have proved (by contradiction) that two conflicting checkpoints cannot both be finalised if fewer than 13\frac{1}{3} of validators (by weight) violate a Casper commandment.

A diagram showing that finalising a conflicting checkpoint must involve a surround vote.

Suppose an earlier, conflicting checkpoint a6a_6 is finalised. Finalisation means that there must be a supermajority link a6a7{a_6 \rightarrow a_7}. One of the supermajority links on the bb chain – in this case, b5b9{b_5 \rightarrow b_9} – must span a6a7{a_6 \rightarrow a_7}.

In this proof, we have relied on the definition of finalisation being that there is a supermajority link from the checkpoint being finalised to the checkpoint in the very next epoch. It turns out that we can relax that definition a little and calculate finality in the form of kk-finality. We will discuss this below.

Economic finality

The proof of accountable safety rested on there being,

  1. no two supermajority links that target distinct checkpoints at the same height, and
  2. no two supermajority links such that one surrounds the other.

These conditions are enforced by the two Casper commandments. Since a supermajority link requires support from 23\frac{2}{3} of the validators (by stake), if there are two supermajority links that break one of these rules, then at least 13\frac{1}{3} of the validators must have voted for both of the links. That is, at least 13\frac{1}{3} of validators made a double vote or a surround vote.

As we have seen already, any validator that breaks a commandment is liable to being slashed, that is, having all or part of its stake removed, and being ejected from the validator set. Therefore, in the event of a conflicting finalisation, we have the guarantee that at least 13\frac{1}{3} of the staked Ether will be slashed.

In this way, slashing puts a price on attacking the chain, and a huge (and calculable) cost on successfully attacking the chain8. As Vitalik puts it,

Basically, if a block is finalized, then that block is part of the chain, and it is very, very expensive to cause that to change.

We call this "economic finality". It is not finality enforced by software; it is finality enforced by the cost of an attack. Validators' stakes are a "good behaviour bond" that can be taken away from them if it is ever proved that they broke the protocol's rules. Validators have unique identities in the form of their secret keys that they use to sign all their messages, so it is possible to hold individual validators to account, and punish them very specifically.

You might wonder why we need a concept like economic finality at all. After all, PBFT manages to deliver finality without such a construct. Couldn't the protocol simply refuse to finalise a conflicting checkpoint?

The difference is that in PBFT there is the luxury of a hard safety assumption that fewer than one-third of the validators are adversarial. Similarly, in Casper FFG, an adversary with less than one-third of the stake cannot finalise conflicting checkpoints. However, in the world of permissionless blockchains, we must have some defence for when more than one-third of the stake is adversarial. That defence is slashing, which gives us the economic finality guarantee: if more than one-third of validators are willing to behave badly, we can't prevent them from finalising conflicting checkpoints, but we can put a huge cost on doing so.9

In a blog post, On Settlement Finality, Vitalik puts it like this,

We can't guarantee that "X will never be reverted", but we can guarantee the slightly weaker claim that "either X will never be reverted or a large group of validators will voluntarily destroy millions of dollars of their own capital".

Faced with a greater than one-third attacker and an asynchronous network, it's not useful for validators simply to refuse to finalise conflicting checkpoints in-protocol. Attacks that can finalise conflicting checkpoints depend on partitioning the set of honest validators so that they don't see each other's votes, and don't know what the other side has finalised. Under the treat of such powerful attacks, economic finality is a powerful safety guarantee. The ultimate remedy in the event of conflicting finality is manual intervention. As Vitalik observes in the same post, "the user community around an on-chain asset is quite free to simply apply common sense to determine which fork was not an attack and actually represents the result of the transactions that were originally agreed upon as finalized".

Plausible liveness

Casper FFG on its own does not offer liveness in the classic sense of ensuring that users' transactions get included on chain. All block production and chain building is the responsibility of the underlying consensus mechanism, LMD GHOST in our case.

However, there is a sense in which we want Casper FFG to be live: we always want to be able to continue justifying and finalising checkpoints if at least two-thirds of validators are honest, without any of those validators getting slashed. Conversely, we never want to end up in a deadlock in which it is not possible to finalise a new checkpoint without honest validators being slashed. This accords with the "something good eventually happens" definition of liveness.

In Vitalik's words,

Plausible liveness basically means that "it should not be possible for the algorithm to get 'stuck' and not be able to finalize anything at all".

More formally, from the Casper paper,

Supermajority links can always be added to produce new finalized checkpoints, provided there exist children extending the finalized chain.

The proof runs like this. There will be an existing highest justified checkpoint, aa, and there will be a checkpoint bb at the same height or higher (not necessarily descending from aa) that is the highest checkpoint for which any validator has made a target vote.

Let cc be a checkpoint on the chain descending from aa in epoch h(b)+1h(b) + 1, that is, the epoch directly after bb.

All validators can vote for the link ac{a \rightarrow c} without fear of being slashed because, (1) it cannot be a double vote, since no validator has previously voted with target h(c)h(c), and (2) it cannot be a surround vote since no honest validator can have previously used a source higher than h(a)h(a), nor can it be a surrounded vote since no existing link has a target higher than h(b)h(b).

A diagram showing that we can always find a checkpoint descended from the highest justified that can be finalised without breaking a commandment.

Voting for the link ac{a \rightarrow c} is safe because (1) nobody has previously voted with cc as the target as bb has the highest target votes so far, and (2) it cannot surround another link since there is no higher justified checkpoint than aa to use as a source vote.

Therefore we can justify checkpoint cc. It is then clearly safe for all validators to vote cd{c \rightarrow d} where dd is the direct child of cc, and thus we can finalise cc without breaking either of the commandments.

This requirement for plausible liveness underpins Casper FFG's fork choice rule: the underlying consensus mechanism must follow the chain containing the justified checkpoint of the greatest height. As long as the underlying chain keeps building on top of the highest justified checkpoint then, according to this proof, we are guaranteed to be able to keep finalising checkpoints on it without anyone getting slashed.

Exercise

As an interlude, it's a fun and useful exercise to think through different ways in which Casper FFG could behave.

For example, how could a situation like the following diagram arise? That is, with supermajority links continually skipping checkpoints, resulting in a string of justified checkpoints with none finalised.

A diagram showing a chain of justified checkpoints with supermajority links that continually skip a checkpoint.

Always justifying but never finalising. How can we get into such a situation?

The answer is below, but take a moment to think through the circumstances that could lead to this.

Casper FFG miscellania

Incentives

In LMD GHOST, we saw that correct head votes that are included in a block in the next slot receive a reward. But there was no penalty for late or incorrect head votes.

Rewards and penalties are a little more complex in the implementation of Casper FFG. Validators are well incentivised to make accurate and timely Casper FFG votes: 22% of a validator's potential staking rewards come from source votes, and 41% from target votes. Moreover, inaccurate or late source or target votes are penalised by the same amount as the reward.

To get any Casper FFG reward at all, a validator's source vote must be correct. That is, it must agree with the history that the eventual canonical chain converges on. If the source vote is incorrect, then it's as if the validator has not voted at all and it receives a full penalty for missing both source and target. The rationale for this is that having a wrong source means that it was working on a different branch from the one that became canonical in the end. Thus its votes were competing with rather than supporting the consensus.

To receive the source vote reward, the source vote must be correct, and it must also be timely. If a validator's vote is included in a block within five slots, then it receives the source vote reward, otherwise it receives a penalty of the same size. There is some discussion of the reason for the five slot limit in the Annotated Specification.

To receive the target vote reward, the target vote must be correct, and it must also be timely. If a validator's vote is included in a block within thirty-two slots, then it receives the target vote reward, otherwise it receives a penalty of the same size10.

The Annotated Spec has a full matrix of rewards and penalties for the different degrees of correctness and timeliness of votes.

Dynamic validator sets

In all of the above, we have discussed Casper FFG in terms of a static validator set, assuming that no validators enter or exit the protocol. This is not entirely realistic, as we wish to be able to onboard new stakers and to allow stakers to exit.

The Casper FFG paper discusses how accountable safety can be maintained when the validator set changes from epoch to epoch. It analyses this in terms of forward and rear validator sets. The implementation in Ethereum 2.0 ignores this mechanism, working around it instead by severely rate-limiting validator activations and exits. Each epoch, we allow activations and deactivations of validators amounting to a fraction of about 0.0015% of the full validator (see CHURN_LIMIT_QUOTIENT).

The effect of this simplification is analysed in the Gasper paper, section 8.6. By rate-limiting entries and exits, but not accounting for forward and rear validator sets, we fractionally reduce the level of accountable safety. That is, it is possible that slightly less than one-third of the stake gets slashed in the event of finalising a conflicting checkpoint. Specifically, if the validator sets between the epochs of the two conflicting finalised checkpoints differ by a staked amount ε\varepsilon, then the economic finality (the minimum amount of stake that will get slashed) becomes 23ε\frac{2}{3} - \varepsilon rather than 23\frac{2}{3}. In practice, the rate limiting of validator set changes is so severe that this difference is negligible.

k-finality

The original Casper paper requires that, to finalise a checkpoint, we must have a supermajority link from that checkpoint to its direct descendant. It turns out that we can generalise this without affecting the validity of the safety proof.

The key observation that the safety proof relies on is the existence of a supermajority link between two consecutive members of B\mathcal{B} spanning the supermajority link amam+1{a_m \rightarrow a_{m+1}} that finalises ama_m. However, the surrounding link on the bb branch must still exist if there is a supermajority link amam+k{a_m \rightarrow a_{m+k}} where all the checkpoints between ama_m and am+ka_{m+k} are justified on the aa branch. That's because B\mathcal{B} cannot have members in the epochs mm to m+km+k if this is so.

A diagram showing that we can safely finalise a checkpoint with a supermajority vote than spans multiple contiguous justified checkpoints.

The guarantees of the accountable safety proof all carry over if we finalise a checkpoint when there is a supermajority link from it that jumps over only justified checkpoints. Here, a5a_5 and a6a_6 are justified, so we can safely finalise a4a_4 with the supermajority link a4a7{a_4 \rightarrow a_7}.

So, our generalised finality rule is that we can finalise checkpoint ama_m when we have a supermajority link amam+k{a_m \rightarrow a_{m+k}} and checkpoints am+1a_{m+1}, am+2a_{m+2}, \ldots, am+k1a_{m+k-1} are all justified.

This is called kk-finality and is discussed in section 4.5 of the Gasper paper.

How many checkpoints kk one might want to consider when calculating finality depends on how much record keeping one is prepared to do. On the Ethereum 2.0 beacon chain we have adopted 22-finality: we keep a record of the justification status of four consecutive epochs, and allow the processing of target votes for two epochs (older target votes are considered invalid).

A diagram of the four 2-finality scenarios.

The four cases of 2-finality. In each case the supermajority link causes the checkpoint at its start (the source) to become finalised and the checkpoint at its end (the target) to become justified. Cases 2 and 4 are classic 1-finality. Checkpoint numbers are along the bottom.

Almost always, we would expect to see only the 11-finality cases, in particular, case 4. The 22-finality cases would occur only in situations where many attestations are delayed, or when we are very close to the two-thirds participation threshold. Note that these evaluations stack, so it is possible for rule 2 to finalise Cn2C_{n-2} and then for rule 4 to immediately finalise Cn1C_{n-1}, for example.

The detailed mechanics of this are performed in the weigh_justification_and_finalization() function during epoch processing11.

Why two-thirds?

Where does the 23\frac{2}{3} majority threshold for finalisation come from? It's actually not obvious, and we could have chosen a different value to define a supermajority link. Let's call this threshold pp: if a proportion pp of the validator set votes to finalise a checkpoint then it is finalised.

We are seeking to balance two factors. On the one hand, a proportion 1p1-p of adversarial or faulty validators can prevent finality, which is a kind of liveness failure.

On the other hand, we want to maximise the accountable safety of finalisation. That is, the proportion of the stake that must equivocate in order to finalise conflicting checkpoints. This proportion is 2p12p-1.

Given these constraints, setting p=23p = \frac{2}{3} maximises fault tolerance against liveness attacks – less than one third of validators cannot prevent finalisation – while maximising tolerance to safety faults – at least one third of validators would be slashed if conflicting blocks were finalised.

Graph showing the tradeoff between liveness fault tolerance and safety fault tolerance.

The threshold that maximises both accountable safety and tolerance to liveness faults together is p=23p = \frac{2}{3}.

Vitalik wrote some notes on this in footnote 2 of the Minimal Slashing Conditions article.

How not to get slashed

The two Casper commandments are quite simple, and avoiding being slashed is straightforward in principle: just don't break the rules.

By far the most common reason that validators get slashed is due to their secret signing key being run on different nodes at the same time. If the nodes' views of the network diverge at all – for example, due to a checkpoint block being seen late by one, but not the other – then each instance of the validator can end up signing different votes for the same epoch, violating the first commandment.

The basic way to avoid this is simply not to do that. Only ever run your keys in one place at any time. Client software implementations often provide a defensive mechanisms such as doppelganger detection to help to protect stakers from doing this inadvertently. Doppelganger detection will wait for a couple of epochs before starting to sign attestations; if, during this time, it sees signatures on chain from another instance with the same keys, it refuses to start. An alternative is to delegate signing of votes to a centralised signer that maintains a database of your validator's past votes, and will refuse to sign anything that violates the commandments. One such signing service is Web3Signer.

It is possible in rare instances for even a single instance of a validator to get into a situation where it could make slashable attestations. For example, if the host machine's clock were to jump backward in time. Or when there is a long reversion in the chain, greater than an epoch, a validator's duties can be recalculated. It may find that it has voted in the epoch once already due to its old duties, and then its new duties call for it to vote again later in the epoch, which will lead to a slashable double vote.

For reasons like these, all client software includes slashing protection mechanisms. Different clients take different approaches, but there is an agreed common interchange format for the slashing protection data, EIP-3076, that can be used for migrating between clients if necessary. Teku takes a very robust, minimalist approach to this. For each validator managed by a Teku node, it maintains a text file with the epoch numbers of the validator's highest source vote and highest target vote to date. For a new attestation to be signed, its source must not be lower than the stored source, and its target must be higher than the stored target. This is sufficient to guarantee that a single instance of a Teku validator will not make a double vote for the same target epoch, and will not make a surround vote.

Casper FFG vs PBFT

As Vitalik acknowledges, the technical roots of Casper FFG are in the classical BFT (Byzantine fault tolerant) consensus protocols developed during the 1980s and 1990s. In particular, it has some similarities to the PBFT (Practical Byzantine Fault Tolerance) algorithm that was published in 1999.

Nevertheless, Casper FFG is not PBFT, and there are some marked differences between the two. What follows is not a rigorous comparison, but touches on the main points.

Both PBFT and Casper FFG are "round based", and involve a two-phase commit process. In Casper FFG, a full round is two epochs, but rounds are overlapped, or pipelined, so the PREPARE step of one round (justification) coincides with the COMMIT step of the previous round (finalisation). This overlap allows Casper FFG to use only one message type (an attestation) that contains two votes (the source and target). Classical PBFT relies on replicas (validators) broadcasting separate PREPARE and COMMIT messages, and rounds are strictly sequential.

Both PBFT and Casper FFG rely on a leader in some way. The leaders in Casper FFG are the block proposers of the underlying consensus mechanism, and are expected to change every round. The leader in PBFT is called the primary, and is only changed if it is deemed by the other replicas to be offline or faulty. PBFT has a whole "view change" mechanism to handle switching to a new leader when necessary.

Importantly, PBFT will stall if more than one-third of validators are offline since it will be unable to execute a view change. In these circumstances, block production would halt completely, which is a consequence of PBFT and its close relatives favouring safety over liveness. Casper FFG will also stall if more than one-third of validators offline in the sense that finalisation will not move forward. However, there is nothing to prevent the underlying chain from continuing to make progress and provide liveness to the system as a whole. This results from Casper FFG's nature as an overlay on top of an underlying block proposal mechanism.

Finally, the safety guarantees of each differ. Safety (finality) in PBFT is a guarantee that, when fewer than one-third of the replicas are faulty, the output of a round will never be changed. Of course, this is subject as always to social consensus. PBFT is a permissioned protocol, falling into the proof of authority category. If all the authorised participants colluded to update their software, they could revert the state of the system quite easily.

Safety in Casper FFG adds the further cryptoeconomic guarantee that a conflicting checkpoint cannot be finalised without burning at least one-third of the stake. This is a substantially different type of guarantee, but fits well with the permissionless nature of Ethereum's proof of stake protocol.

Are the Casper commandments optimal?

There's an interesting discussion to be had around whether the two Casper commandments are ideal or not. For example, there are situations, like the first surround vote shown above, that are harmless, but would nevertheless lead to the validator being slashed. Daniel Lubarov discussed another such scenario in a post that proposes replacing the second commandment with, "a validator must be prohibited from casting a finalization vote within the span of another vote" (we currently prohibit all spanned votes).

Along similar lines, Justin Drake has proposed a tight and intuitive Casper slashing condition that unifies the two commandments into a single commandment.

A validator must not cast a vote st{s \rightarrow t} that "hops over" one of his finalisation votes s~t~{\tilde{s} \rightarrow \tilde{t}}, i.e. h(s)h(s~)h(s) \le h(\tilde{s}) and h(t)h(t~)h(t) \ge h(\tilde{t}) [and such that the target votes conflict].

Jacob Eliosoff goes further by suggesting that we remove the source vote altogether and rework the slashing rules accordingly.

While worth thinking through, I think that proposals like these have failed to gain traction simply because what we have is good enough, on the principle of "if it ain't broke, don't fix it". Much work has been done on implementing, analysing, and formally verifying the current version of Casper FFG, and making any change now that does not gain us a huge benefit (such as single slot finality) is unlikely to be worth the effort.

Conflicting justification

If you want to test your ability to reason about this distributed consensus stuff – and it's not easy – it's worth thinking through why we need both a justified and a finalised status. Why can't I immediately mark as finalised any checkpoint for which I've seen a 23\frac{2}{3} supermajority vote?

The key point is that justification is a local property; finality is global.

For me to justify a checkpoint means that I've heard from 23\frac{2}{3} of the validators that they think that the checkpoint is good. But this is only my local view, since the other validators might have heard something different - I have no idea whether they did or didn't. Nevertheless, as an honest validator, I commit to never reverting any checkpoint that I've justified in my local view.

For me to finalise a checkpoint means that I've heard back from 23\frac{2}{3} of the validators that they've heard from 23\frac{2}{3} of the validators that the checkpoint is good. I now know that 23\frac{2}{3} of the validators know that 23\frac{2}{3} of validators have marked that checkpoint as justified, and therefore that a supermajority of validators globally has committed to never reverting it. That checkpoint is globally safe from reversion.

This is all a bit mind-bending, so perhaps it's best understood by looking at ways things can fail if we don't do the full round-trip of confirming that everyone confirms what I have confirmed12.

Simplified model

Let's consider an extreme case. We have four validators, AA, BB, CC, and DD. They are all honest, but the network can suffer indefinite delays. For illustration purposes we will have a checkpoint at every block height. We'll arbitrarily number the epochs from 00, but block 00 is not intended to be the genesis block - it could be, but that would slightly alter the description that follows.

Epoch 1 - setup

A diagram showing validator A and validators B, C and D having a common view in epoch 1.

In epoch 11, block 11 contains enough votes to justify checkpoint 00. Everybody votes 01{0 \rightarrow 1}.

Recall that only Casper FFG votes contained in blocks affect justification and finalisation. At the start, all the validators have a common view. They all see block 11, and it contains enough votes to justify checkpoint 00. So far this is the happy flow. All four validators make the same vote 010 \rightarrow 1: their source checkpoint is 00, which they just justified; their target checkpoint is 11, the current epoch's checkpoint. It's a supermajority vote that ought to lead to checkpoint 11 being justified by everybody.

Epoch 2 - inconsistent justification

This is where things go wrong. In epoch 22, validator AA is chosen to propose, but for some reason AA's block is not seen by the others - it was severely delayed on the network, perhaps by a denial of service attack. Therefore, validators BB, CC and DD don't see the supermajority link it contains that justifies checkpoint 11, so they don't know that anyone else likes checkpoint 11. We now have split views: validator AA has justified checkpoint 11 and is irrevocably committed to it; the other validators still consider checkpoint 00 to be the highest justified.

A diagram showing that the views of validator A and of validators B, C and D have diverged in epoch 2. Validator A has justified checkpoint 1, but the others have not.

In epoch 22, validator AA proposes block 22. It contains all four 01{0 \rightarrow 1} votes, but validators BB, CC and DD never see it. Validator AA has seen a supermajority link 01{0 \rightarrow 1}, so it marks checkpoint 11 as justified. Validators BB, CC and DD saw no votes in epoch and their best justified checkpoint remains 00. AA votes 12{1 \rightarrow 2}; BB, CC and DD vote 0X{0 \rightarrow X}, where XX is an empty checkpoint.

The source of the issue is that AA has seen everyone else agree, but the others have no evidence of the agreement. As far as BB, CC and DD are aware, each is on its own. They have no alternative but to leave things as-is for now and to try to agree in the next round13.

It's not important for this example, but it's worth noting that, by justifying checkpoint 11, validator AA will have finalised checkpoint 00. Validator AA knows that checkpoint 00 will never be reverted globally since it knows that at least 23\frac{2}{3} of validators consider it justified (they used it as their source vote) and will therefore never revert it, irrespective of whatever else is happening.

As for voting in epoch 22, validator AA will vote 12{1 \rightarrow 2}, and the three others will vote 0X{0 \rightarrow X}, where XX shows that they see the checkpoint in epoch 2 as empty. We haven't discussed empty checkpoints yet – we'll cover them when we get to Gasper – for now you can understand that a vote for empty checkpoint XX is a statement that the head of the chain in epoch 22 remains block 11.

Epoch 3 - conflicting justification

Now assume that the block in epoch 33 is proposed by one of BB, CC or DD. It contains the three 0X{0 \rightarrow X} votes, but cannot contain AA's vote since it has a different source. In turn, validator AA will consider block 3 to be invalid since it contains attestations whose source is not checkpoint 11, AA's highest justified checkpoint. The three votes are enough in the views of BB, CC and DD to form a supermajority, so they duly justify checkpoint XX.

A diagram showing that the views of validator A and of validators B, C and D have diverged further in epoch 3. Validator A has justified checkpoint 1, but the others have justified an empty checkpoint in epoch 2.

In epoch 33, one of BB, CC or DD publishes block 33. It contains the three 0X{0 \rightarrow X} votes, so validators BB, CC and DD have a supermajority link justifying the empty checkpoint XX. Validator AA considers block 33 to be invalid.

At this point, validator AA's situation is irrecoverable. The BCDBCD chain will never make votes with checkpoint 11 as source, so validator AA will never consider their attestations or their blocks valid, and vice-versa. Justification cannot progress on AA's chain since it can see only 14\frac{1}{4} of the voting weight. The only remedy is for the node hosting AA to wipe its database and perform a fresh re-sync onto the canonical chain. Validator AA will then be able to join in again as usual without any risk of violating a Casper commandment14.

Note that we have not yet answered our question. What if validator AA had immediately marked block 11 as final rather than only justified? In this example it would not have made a material difference. Block 11 ultimately appears in the canonical chain progressed by BB, CC and DD, so marking it as final would have been fine.

Now with adversarial action

To see why skipping the justification step is dangerous we will introduce some adversarial action. Let's say that CC and DD are not honest validators.

When all three of BB, CC and DD are acting honestly, as above, in epoch 22 they will vote for the empty checkpoint XX that descends from block 11, then in epoch 33 will justify it. This locks them into including block 11 into their canonical chain, since the Casper FFG fork choice now prevents them from building a conflicting branch.

However, if a majority of BB, CC and DD is dishonest, they can fail to justify XX (by not voting), and in epoch 33 can choose to build a new branch on block 00, which remains their highest justified checkpoint. In so doing, they will orphan block 11. Crucially, they can do this without violating a slashing rule.

A diagram showing that dishonest validators can orphan block 1.

When a majority of BB, CC and DD is not honest, they can fail to justify checkpoint XX and build a branch that orphans block 11.

Now we get to the point. Had validator AA immediately marked block 11 as final, it would be disastrous for any applications or users that relied on validator AA's node for information about the chain. They would have seen a finalised block that never appears in the later canonical chain maintained by BB, CC and DD. That is, they would have seen a "finalised" block being reverted without any slashing, violating Casper FFG's accountable safety guarantee.15

Note that we've framed this in terms of adversarial action. But, actually, the assumption of dishonesty is not necessary to prove our point. Perhaps a network delay prevented the epoch 22 votes from CC and DD from making it on chain. It is evident that the bad consequences of premature finalisation don't depend on there being more than one-third dishonest validators, which is pretty disastrous.

In summary, the only way to guarantee safety is via the two-phase commit. This guarantees that a block marked finalised will always appear in the canonical chain, unless at least one-third of validators are slashed.

Summary and reflections

Using this toy model, we've have seen that we are able to achieve Casper FFG's guarantees only by using the two-phase commit. We cannot skip justification and go straight to finalisation when we see a supermajority link.

Even this radically simplified model of the beacon chain is quite tricky to reason about. But all the confirming and re-confirming of things is very much in the nature of distributed systems. Everything would be much easier if we had a trusted central authority. For example, in a national leadership election, we all cast our votes and the central authority counts them and broadcasts the result - it's a single round process. But the consequences can be very bad if the central authority turns out to be corrupt. The two-phase commit is the cost of making our protocol incorruptible.

History of Casper FFG

The development of the proof of stake consensus protocol for Ethereum 2.0 has a long history which is very well summarised by Vitalik in a tweet thread (consolidated here), and in Vlad Zamfir's memoirs linked from there.

The origins date back as far as January 2014 and Vitalik's Slasher: A Punitive Proof-of-Stake Algorithm article. Although almost nothing of the Slasher algorithm is in use today, it introduced the idea of punishing misbehaving validators for violating protocol rules, thus solving the nothing at stake problem that we discussed under LMD GHOST. Slashing paved the way for the idea of economic finality.

In 2015 Vitalik was working on consensus by bet, a form of which appeared in 2016 in the Ethereum.org 2.0 Mauve Paper. He describes consensus by bet as having been, "a big long, and ultimately unproductive, tangent". However, consensus by bet did pioneer the idea of retrospectively conferring finality on top of a forkful block production mechanism, which is the essence of Casper FFG.

The Casper FFG mechanism that we have today really started taking shape in 2017, after Vitalik had returned to the classical PBFT literature. The design that emerged begins to look quite familiar to us today, although it doesn't yet use the "justification" terminology, still preferring the PREPARE and COMMIT terms of PBFT. A really novel feature it provides is achieving economic finality through the use of slashing conditions. There were four slashing conditions at that time, which were later reduced to two16, with one message type.

Vitalik's Casper FFG and Vlad Zamfir's Casper CBC consensus protocols grew up alongside each other during this period. As Vitalik describes in a blog post in December, 2016, the design imperative for Casper FFG was to, "create a simple proof of stake protocol that would provide desirable properties with as few changes from proof of work as possible". He contrasts this typically pragmatic approach with Zamfir's more purist desire to "rebuild consensus from the ground up". Confusingly, the two resulting Casper protocols have little in common apart from generic proof of stake things. Nevertheless, Vitalik still occasionally expresses the hope that Ethereum might eventually switch to something like Casper CBC.

The first version of the Casper the Friendly Finality Gadget paper was uploaded in October 2017. It was written by Vitalik and Virgil Griffith, and pretty much describes the scheme that we use today - subject to the modifications for Gasper that we'll cover in the next section.

As noted earlier, the original plan had been to apply Casper FFG as an overlay on Ethereum's existing proof of work protocol. This effort progressed quite far, with an EIP-1011 testnet going live on the 31st of December, 2017.

The PoW overlay plan was abandoned in 2018 in favour of moving directly to full proof of stake via a new beacon chain architecture running the Gasper (LMD GHOST plus Casper FFG) consensus architecture - this is the architecture that we have today.

The initial specification for Casper FFG on the beacon chain was maintained on Ethresear.ch (you can see the document history there). It includes kk-finality, a slightly weird version of the Casper FFG fork choice rule, and the two Casper commandments as we use them today. However, the beacon chain now uses 32 slot epochs rather than 64, and we did not implement the dynamic validator set mechanism.

The current Casper FFG specification is maintained as part of epoch processing in the beacon chain's state transition function.

Answer to the Exercise

Here's the answer to the exercise above.

Answer

Suppose that attestations are always delayed by exactly one epoch, so that votes made in epoch NN are not processed until epoch N+1N+1, and consider the following progress of the Casper FFG algorithm.

A diagram showing validators voting for a supermajority link from checkpoint 0 to checkpoint 1.

We start with Checkpoint 0 being justified. During Epoch 1 everyone votes 01{0 \rightarrow 1} as usual. The supermajority link vote is shown dashed because processing the votes will be delayed until the next epoch.

A diagram showing validators voting for a supermajority link from checkpoint 0 to checkpoint 2.

At the end of Epoch 1, no votes have been seen, due to the delay, so Checkpoint 1 remains unjustified. During Epoch 2 everyone votes 02{0 \rightarrow 2}.

A diagram showing validators voting for a supermajority link from checkpoint 1 to checkpoint 3.

At the end of Epoch 2, we can process the Epoch 1 votes, which justify Checkpoint 1 (and finalise 0, but that's irrelevant). Checkpoint 2 remains unjustified as we haven't seen the Epoch 2 votes yet. During Epoch 3 everyone votes 13{1 \rightarrow 3}.

A diagram showing validators voting for a supermajority link from checkpoint 2 to checkpoint 4.

Justification continues to lag one epoch behind due to the delayed attestations. During Epoch 4 everyone votes 24{2 \rightarrow 4}.

A diagram showing a string of justified blocks with supermajority links that skip alternate checkpoints.

We are now perpetually locked into this leap-frog behaviour where votes always skip a checkpoint because the skipped checkpoint's justification always happens one epoch late.

See also

The original Casper the Friendly Finality Gadget paper remains the canonical reference. Although the details of our implementation in Ethereum 2.0 differ in some respects, the foundations remains the same.

That paper also discusses using an "inactivity leak" to recover from catastrophic crashes (section 4.2). I've covered elsewhere how we've implemented the inactivity leak in Eth2.

Once again, Vitalik's Casper history Tweet storm (also available here and here) provides terrific first-hand background to the development of the Casper consensus protocols, CBC and FFG.

In the consensus layer specifications:

One of the better articles I've found on Casper FFG is by Juin Chiu. It is particularly good on the relationship between Casper FFG and classical PBFT. Vitalik's Minimal slashing conditions article contains many insights (even if those slashing conditions did not turn out to be minimal).

Some formal verification work on the guarantees of Casper FFG (as presented in the original paper, that is, without kk-finality, for example) is described in the Verification of Casper in the Coq Proof Assistant (2018) paper. It contains some useful insights that clarify the assumptions behind the plausible liveness proof, in particular.


  1. See, for example, the paper Asynchronous consensus and broadcast protocols (1985) by Bracha and Toueg. The earlier work Reaching Agreement in the Presence of Faults (1980) by Pease, Shostak and Lamport gives the same bound, but is based on synchronous communications in a system where validators can forge messages. Our environment is asynchronous with unforgeable messages, so Bracha and Toueg's analysis is the relevant one.
  2. This is, of course, quite confusing. But it's very far from being the most confusing thing in Ethereum naming, so we just live with it.
  3. The Casper FFG paper generally uses the word "dynasty" for epoch, with a few exceptions. It's the same thing.
  4. The two rounds map roughly onto the PREPARE and COMMIT phases of classical PBFT consensus. The PRE-PREPARE phase of PBFT more or less corresponds to a checkpoint block being broadcast in Casper FFG.
  5. It was a slashable offence to link conflicting checkpoints in earlier versions of Casper FFG, but this was simplified in the Casper FFG design we use today, as per footnote 4 of the Casper FFG paper.
  6. See Protolambda's eth2-surround GitHub repo for some analysis of the challenges of finding surround votes.
  7. Slashing detection software has been created by the Lighthouse team and the Prysm team.
  8. As I write (in July 2023), there is 21.6 million ETH staked on the beacon chain, so a finality reversion would result in at least 7.2 million ETH being slashed. This amounts to $13.7 billion of economic security against reversion, at current prices.
  9. This has some conceptual similarity to the cost of mounting a 51% attack in proof of work. Except that, in PoW, the cost of a successful attack can be zero, since the attacker gains all the block rewards, and the attack can be repeated indefinitely using the same hardware. The economic finality provided by slashing is much higher. In Vlad Zamfir's famous words, getting slashed is "as though your ASIC farm burned down if you participated in a 51% attack".
  10. This will be changed at the Deneb upgrade. Target votes for either the current or previous epoch will be valid, rather than expiring after 32 slots.
  11. Danny Ryan briefly discussed kk-finality in this video from Devcon V.
  12. If you really want to test your understanding of all this "I confirm that everybody has confirmed that everybody has confirmed" stuff, which is at the heart of consensus safety, then I highly recommend tackling the blue eyes puzzle. I think Joseph Poon introduced me to this in early 2018. It took me a good couple of hours of head scratching to get there, but the effort is worth it. The solution is out there, but I urge you to take a good run at the puzzle before looking it up.
  13. They are likely to have seen each others' votes via gossip, and in a more realistic model, with multiple blocks per epoch, are likely to include these votes in later blocks. But it is not guaranteed, and in our contrived, simplified model, is not possible.
  14. It should be recognised that this is a rather extreme and contrived example. It would be very unusual for anything like this to occur in practice.
  15. If we had assumed that CC and DD were controlled by the adversary from the outset, and that the adversary has some control over which of AA or BB sees their blocks, then it is quite simple for the adversary to split the views of AA and BB so that each justifies a checkpoint on a different branch. I shall leave it as an exercise for the reader to show this - the flow is similar to the above.
  16. Footnote four of the Casper FFG paper explains the thinking behind reducing the number of message types from four to two.

Created by Ben Edgington. Licensed under CC BY-SA 4.0. Published 2023-09-29 14:16 UTC. Commit ebfcf50.