Part 2: Technical Overview

Consensus

Casper FFG

  • Casper FFG adds finality to the Eth2 consensus protocol.
  • It acts as an overlay on LMD GHOST consensus that modifies its fork choice rule.
  • Casper FFG is classically safe under asynchrony when less than 13\frac{1}{3} of validators are faulty or adversarial.
  • Moreover, slashing allows Casper FFG to provide accountable safety, also known as economic finality, when more than 13\frac{1}{3} are adversarial.

Introduction

Many articles have been written to explain the basic mechanics of Casper FFG – how it works – but there is very little about why it works. I hope that by the end of this section you will feel that you have some insight into why Casper FFG is effective.

The mechanics of Casper FFG are not very complicated. Even so, as you read on, bear in mind that the effectiveness of Casper FFG really comes down to two big ideas. First, the two-phase commit (justification and finalisation) and, second, accountable safety.

The two-phase commit gives Casper FFG classical consensus safety. It makes it possible to declare blocks final, being certain that no honest validator will ever revert them. But that's enforceable only when over two-thirds of the stake is controlled by honest validators, something we cannot always assume.

On top of this, Casper FFG offers an extra guarantee called accountable safety for situations when over one-third of the validators are dishonest. If the chain ever suffers from conflicting finality, at least one-third of the total amount of staked Ether will be burnt. This is enforced by slashing validators that break either of the two Casper commandments.

Overview

Casper FFG is a kind of meta-consensus protocol. It is an overlay that can be run on top of an underlying consensus protocol in order to add finality to it. Recall that finality is the property that there are blocks in the chain that are guaranteed never to be reverted: they will be part of the chain forever. In Ethereum's proof of stake consensus, the underlying protocol is LMD GHOST which does not provide finality - there is always a chance that validators might decide to build a competing chain, and there is no real penalty for doing so. Casper FFG functions as a "finality gadget", and we use it to add finality to LMD GHOST.

Casper FFG takes advantage of the fact that, as a proof of stake protocol, we know who our participants are: the validators that manage the staked Ether. This means that we can use vote counting to judge when we have seen a majority of the votes of honest validators. More precisely, votes from validators managing a majority of the stake - in everything that follows, every validator's vote is weighted by the value of the stake that it manages, but for simplicity we won't spell it out every time.

We operate on an asynchronous network, the Internet, which means that we can tolerate at most 13\frac{1}{3} of the validators being adversarial (or faulty) if we want to achieve both safety and liveness. This is a well-known result in consensus theory1, and the reasoning goes as follows.

  • We have a total of nn validators, of which a number, ff, may be faulty or adversarial in some way.
  • To preserve liveness, we need to be able to make a decision after hearing from only nfn - f validators, since the ff faulty ones might withhold their votes.
  • But this is an asynchronous environment, so the ff non-responders may simply be delayed, and not faulty at all.
  • Therefore we must account for up to ff of the nfn - f responses we have received being from faulty or adversarial validators.
  • To guarantee that we can always achieve a simple majority of honest validators after hearing from nfn - f validators, we require that (nf)/2>f(n - f)/2 > f. That is, n>3fn > 3f.

To summarise, like all classical Byzantine fault tolerant (BFT) protocols, Casper FFG is able to provide finality when less than one-third of the total validator set is faulty or adversarial. When sufficient honest validators have declared a block finalised, all honest validators will follow, and that block will not be reverted. However, as we shall see, unlike classical BFT protocols, Casper FFG is also able to provide economic finality (accountable safety) even when more than one-third of the validator set is faulty or adversarial.

In this section, we will consider Casper FFG on its own terms, without spending much time on how it is integrated with LMD GHOST. This is in the spirit of the Casper FFG paper which has very little to say about the underlying blockchain consensus mechanism. We will look at how the two are combined into Gasper in the next section.

Naming

Once again, the name Casper FFG comprises two parts, and it's worth looking at them both.

Casper

The Casper part of the name seems to be due to Vlad Zamfir. He explains in Part 5 of his History of Casper as follows.

In this chapter I recount the story of Casper’s birth as an application of the principles of Aviv Zohar and Jonatan Sompolinsky’s GHOST to proof-of-stake.

I called it "the friendly ghost" because of incentives designed to guarantee censorship resistance against the oligopolists: incentives that force the cartel to be friendly to non-cartel validators.

The GHOST protocol he mentions is the same as we looked at in the previous section. It helps to make sense of all this if you share the cultural context that Casper the Friendly Ghost is a cartoon character that's been around since the 1940s.

Zamfir's protocol was initially called Casper TFG (The Friendly Ghost), and later renamed Casper CBC (Correct By Construction). Vitalik's Casper FFG grew up alongside Zamfir's Casper TFG/CBC – which presumably explains the naming synergy – but has very little in common with it. Indeed, Casper FFG doesn't even use the GHOST protocol.2

FFG

The FFG part stands for the "Friendly Finality Gadget", as in the title of the Casper FFG paper.

This is clearly a play on Zamfir's TFG name, but also indicates that Casper FFG is not a fully self-contained blockchain protocol, but a "gadget" than can add finality to an underlying consensus protocol.

Terminology

As ever, the jargon that we use is the way into understanding how the protocol is constructed.

Epochs and checkpoints

In order to come to a decision about finality, the Casper FFG mechanism needs to process votes from at least 23\frac{2}{3} of the validator set. In Ethereum, the validator set is potentially very large, and it is infeasible for votes from several hundred thousand validators to be broadcast, gossiped, and processed simultaneously.

To work around this, voting is spread out through the duration of an epoch3, which, in Eth2, is 32 slots of 12 seconds each. At each slot, 132\frac{1}{32} of the total validator set is scheduled to broadcast a vote, so each validator is scheduled to cast a vote exactly once per epoch. For efficiency, we bundle each validator's Casper FFG vote with its LMD GHOST vote, although that's by no means necessary.

To ensure that validators voting at different times during the epoch have something in common to vote for, we make them vote for a checkpoint, which is the first slot of an epoch. The checkpoint in epoch NN is at slot number 32N32N (remembering that slots and epochs are numbered from zero).

Diagram showing an epoch with a checkpoint at the first slot and blocks within each slot.

An epoch is divided into 32 slots, each of which usually contains a block. The first slot of an epoch is its checkpoint. Time increases from left to right.

As an aside, people often talk lazily of finalising epochs, but that's not correct. Casper FFG finalises checkpoints, the first slots of epochs. When we have finalised the checkpoint in epoch NN, we have finalised everything up to and including slot 32N32N. This includes all of epoch N1N-1 and the first slot of epoch NN. But we have not yet finalised epoch NN - it still has 31 unfinalised slots in it.

For the time-being we will assume that every slot has a block in it. This is because the original Casper FFG's checkpoints are based on block heights rather than slot numbers. We will relax this assumption to allow empty slots and checkpoints when we look at Gasper.

Within the protocol, a Checkpoint object simply contains the epoch number of the checkpoint, and the hash tree root of the head block in the epoch's first slot (root):

class Checkpoint(Container):
    epoch: Epoch
    root: Root
Justification and finalisation

Like classical BFT consensus protocols, Casper FFG achieves finality through a two-round process.

In the first round, I broadcast my view of the current epoch's checkpoint (XX, say) to rest of the network, and I hear from the rest of the network what their view is. If a supermajority tells me that they also support XX, that allows me to justify it. Justification is local to my network view: at this stage, I believe that the majority of the network believes that XX is favourable for finalisation. But I don't yet know that the rest of the network has come to the same conclusion. Under adversarial conditions, it is possible that a sufficient majority of the other validators failed to come to a decision on XX. We shall look at such a scenario later.

In the second round, I broadcast the fact that I've heard from a supermajority of validators that they support XX (that is, that I've justified XX) and I hear from the rest of the network whether they believe that a supermajority of validators supports XX (that is, that they have justified XX). If I hear that a supermajority of validators agrees with me that XX is justified, then I will finalise XX. Finalisation is a global property: once a checkpoint is finalised, I know that no honest validator will ever revert it. Even if they haven't yet marked the checkpoint as finalised in their view, I know that they've at least marked it justified, and that there is no (non-slashable) behaviour that will be able to revert that justification.

To summarise, for me to be absolutely certain that the whole network agrees that a block will not be reverted requires the following steps.4

  1. Round 1 (ideally leading to justification):
    1. I tell the network what I think is the best checkpoint.
    2. I hear from the network what all the other validators think is the best checkpoint.
    3. If I hear that 23\frac{2}{3} of the validators agree with me, I justify the checkpoint.
  2. Round 2 (ideally leading to finalisation):
    1. I tell the network my justified checkpoint, the collective view I gained from round 1.
    2. I hear from the network what all the other validators think the collective view is, their justified checkpoints.
    3. If I hear that 23\frac{2}{3} of the validators agree with me, I finalise the checkpoint.

In short, when I justify a checkpoint I make a commitment never to revert it. When I finalise a checkpoint I know that all honest validators are committed to never reverting it.

A diagram illustrating voting in the two rounds.

In Round 1, I broadcast my best checkpoint and hear from all the others their best checkpoint. Ideally this leads to justification. In Round 2, I broadcast what I heard everyone's best checkpoint to be and hear their views. Ideally this leads to finalisation.

Under ideal conditions, each round lasts an epoch, so it takes an epoch to justify a checkpoint and a further epoch to finalise a checkpoint. At the start of epoch NN we are aiming to have justified checkpoint N1N-1 and to have finalised checkpoint N2N-2.

Quantifying that, it takes 12.8 minutes, two epochs, to finalise a checkpoint in-protocol. In Casper FFG, the two rounds are overlapped and pipelined, so that, although it takes 12.8 minutes from end to end to finalise a checkpoint, we can finalise a checkpoint every 6.4 minutes, once per epoch.

Note that it can be possible from outside the protocol to see that a checkpoint is likely to be finalised a little earlier than the full 12.8 minutes, assuming that there is no long chain reorg. Specifically, it is possible to have collected enough votes by 23\frac{2}{3} of the way through the second round, that is after about 11 minutes. However, in-protocol justification and finalisation is done only during end of epoch processing.

An aside on the nomenclature: the terms "finalised" and "justified" do not appear in the classical consensus literature. It's easy to see where "finalised" comes from, but perhaps not so for "justified", which is frankly a peculiar term to be using here. As far as I can tell, its origins are in Vlad Zamfir's Casper TFG protocol. In that work, messages contain a "justification" to support the vote being made. The paper doesn't use the word "justified" anywhere, but I suspect that's where we got it from. In Casper FFG, my "justfication" is having seen evidence that 23\frac{2}{3} of validators like the same checkpoint as I do.

A vote in Casper FFG has two parts: a source checkpoint vote and a target checkpoint vote. These are the source and target fields in an attestation's data:

class AttestationData(Container):
    slot: Slot
    index: CommitteeIndex
    # LMD GHOST vote
    beacon_block_root: Root
    # FFG vote
    source: Checkpoint
    target: Checkpoint

Source and target votes are made simultaneously in the form of a link st{s \rightarrow t}, where ss is the source checkpoint and tt is the target checkpoint.

The role of my target vote is to broadcast my view of what the I think should be the next checkpoint to be justified. In the terms above, it is my round 1 vote. My target vote is a soft (conditional) commitment not to revert that checkpoint as long as I hear from 23\frac{2}{3} of validators that they also commit to that checkpoint.

The role of my source vote is to broadcast that I've seen support from 23\frac{2}{3} of the network for checkpoint ss, and that it is the most recent such checkpoint that I know about. In the terms above, it is my round 2 vote announcing the collective view that I've heard. By making this source vote I am upgrading my previous soft commitment not to revert the checkpoint to a hard (unconditional) commitment never to revert it.

A diagram illustrating how the votes from the two rounds are combined into one attestation.

Casper FFG combines the source and target votes into a single message: a vote for a link st{s \rightarrow t}.

An honest validator's source vote will always be the highest justified checkpoint in its view of the chain. Its target vote will be the checkpoint of the current epoch that descends from the source checkpoint. The source and the target checkpoints do not need to be consecutive; it is permissible to jump checkpoints. But when the beacon chain is running smoothly, the target vote of one epoch will be the source vote of the next epoch.

A diagram showing a valid link from source to target.

A link from a justified checkpoint to a checkpoint on a chain that descends from it is valid. Only checkpoints are shown here; the intermediate blocks have been omitted for clarity.

In a valid link, the source checkpoint will always be an ancestor of the target checkpoint. If it weren't so, I'd be contradicting myself: the source vote announces my commitment never to revert checkpoint ss; if the target checkpoint tt is not descended from ss then that would be a vote to revert ss. However, it is not a slashable offence to publish such an invalid link5.

A diagram showing an invalid link from source to a conflicting target.

A link from a justified checkpoint to a checkpoint on a chain that does not descend from it is invalid. The two checkpoints are said to be conflicting as neither descends from the other.

There are some specific criteria around timeliness that valid Casper FFG votes must satisfy in the Eth2 implementation. We'll discuss these more in the Gasper section as they don't apply to the abstract Casper FFG protocol that we're considering here.

When weighing Casper FFG votes, only votes that have been received in blocks are considered. Unlike with LMD GHOST's fork choice, we do not consider any Casper FFG votes received solely via gossip, by carrier pigeon, or by any other means. This is because we must always have a common record of our decision making around finality, and the block history provides that common record. So, when I said above, "I tell the network", it's shorthand for saying that I broadcast an attestation that will be picked up by a block proposer and included in a block. And when I said, "I hear from the network", it's shorthand for saying that I processed the attestations included in a block.

Also, allow me to reiterate that votes in Casper FFG are weighted according to validators' effective balances. A vote from a validator with a 24 ETH effective balance carries 75% of the weight of a vote from a validator with a 32 ETH effective balance. I will frequently say things like, "the votes of two-thirds of the validators"; you should understand this as, "the votes of validators managing two-thirds of the stake".

As described above, a link is a Casper FFG vote pair that links the source and target checkpoints, st{s \rightarrow t}.

A link st{s \rightarrow t} is a supermajority link when over 23\frac{2}{3} of the validators (weighted by stake) have published the same link (and had their votes included in blocks in a timely way).

The mechanics of Casper FFG

With (most of) the terminology and key concepts behind us, we can now look at how Casper FFG operates in detail. It is fairly straightforward.

Justification

When a node sees a supermajority link from justified checkpoint c1c_1 to checkpoint c2c_2, then it considers checkpoint c2c_2 to be justified.

A diagram showing that a supermajority link justifies a checkpoint.

My node has seen a supermajority link CNCN+2{C_N \rightarrow C_{N+2}}, therefore I mark CN+2C_{N+2} as justified. Justified checkpoints are hatched and marked with "J".

Justification means that I have seen commitments from over 23\frac{2}{3} of the validator set that they will not revert checkpoint c2c_2 as long as they get confirmation from at least 23\frac{2}{3} of validators that also will not revert c2c_2.

Finalisation

When a node sees a supermajority link from justified checkpoint c1c_1 to checkpoint c2c_2, and checkpoint c2c_2 is a direct child checkpoint of c1c_1, then it considers checkpoint c1c_1 to be finalised.

In other words, a finalised checkpoint is a justified checkpoint whose direct child is justified.

A diagram showing that when the direct child of a justified checkpoint is justified, the parent checkpoint is finalised.

My node has seen a supermajority link CNCN+1{C_N \rightarrow C_{N+1}}, therefore I mark CN+1C_{N+1} as justified. Since CN+1C_{N+1} is a direct child of CNC_N in the checkpoint tree, I also make CNC_N as finalised. Finalised checkpoints are cross-hatched and marked with "F".

Finalisation means that I have seen confirmation from over 23\frac{2}{3} of the validators that they have seen commitments from over 23\frac{2}{3} of the validators that they will not revert checkpoint c1c_1. Checkpoint c1c_1 cannot now be reverted without at least 13\frac{1}{3} of validators provably changing their minds, and therefore getting slashed.

The above rule for finalising a checkpoint is the one described in the original Casper FFG paper. In practice we can use a slight generalisation of this rule, without affecting the safety proof. The generalisation is called k-finality and we will look at it a little later.

The Casper commandments

In the Casper FFG paper, each checkpoint has a defined height: if cc is a checkpoint, then h(c)h(c) is the height of that checkpoint. Checkpoint heights increase monotonically with distance from the genesis block.

In the Eth2 implementation of Casper FFG, the checkpoint height is the checkpoint's epoch number: h(c)=c.epochh(c) = \texttt{c.epoch}. Recall that a checkpoint comprises both a block hash and an epoch number. If either of these differs then the checkpoints are different.

Casper FFG's proof of accountable safety relies on any validator that violates either of the following two rules, or "commandments", being slashed.

No double vote

Commandment 1: a validator must not publish distinct votes s1t1{s_1 \rightarrow t_1} and s2t2{s_2 \rightarrow t_2} such that h(t1)=h(t2)h(t_1) = h(t_2).

Simply put, a validator must make at most one vote for any target epoch.

A diagram showing a violation of the no double vote rule by voting for the same target checkpoint from different source checkpoints.

One way to violate the no double vote rule: voting for the same target checkpoint from different source checkpoints: 03{0 \rightarrow 3} and 13{1 \rightarrow 3}.

A diagram showing a violation of the no double vote rule by voting for target checkpoints at the same height on different branches.

Another way to violate the no double vote rule: voting for different target checkpoints in the same epoch: 03{0 \rightarrow 3} and 03{0 \rightarrow 3'}.

No surround vote

Commandment 2: a validator must not publish distinct votes s1t1{s_1 \rightarrow t_1} and s2t2{s_2 \rightarrow t_2} such that h(s1)<h(s2)<h(t2)<h(t1)h(s_1) < h(s_2) < h(t_2) < h(t_1).

That is, a validator must not make a vote such that its link either surrounds, or is surrounded by, a previous link it voted for.

A diagram showing a violation of the no surround vote rule on a single branch.

One way to violate the no surround vote rule: the link 03{0 \rightarrow 3} surrounds the link 12{1 \rightarrow 2}.

A diagram showing a violation of the no surround vote rule on conflicting branches.

Another way to violate the no surround vote rule: again, the link 03{0 \rightarrow 3} surrounds the link 12{1 \rightarrow 2}, albeit on different branches.

Slashing

Any validator that violates either of the Casper commandments is liable to being slashed. This means that some or all of its stake is removed and it is ejected from the validator set.

Slashing underpins the accountable safety guarantee of Casper by putting a price on bad behaviour - specifically, behaviour that could lead to conflicting checkpoints being finalised. Slashing also features in LMD GHOST, and the Incentive Layer chapter has more information on the detailed mechanics of slashing.

Violations of the commandments are potentially difficult to detect in-protocol. In particular, detection of surround votes might require searching a substantial history of validators' previous votes6. For this reason, we rely on external slashing detection services7 to detect slashing condition violations and submit the evidence to block proposers. There needs to be only one such service on the network, as long as it is reliable. In practice we have more, but it is certainly not necessary for every node operator to run a slashing detector.

Once evidence of breaking one of the commandments has been found, it is easy to prove on chain that the validator broke the rules. Validators sign every attestation they publish, so, given two conflicting attestations, it is a simple matter to verify their signatures and show that the validator broke the rules in publishing them.

The protocol as it is presented in the Casper FFG paper assumes that, on proof of having violated a slashing condition, "the validator's entire deposit is taken away". We have implemented a variation of this for Eth2 that scales the proportion of a validator's stake that is forfeit in proportion to total amount of stake slashed in a given period. If 13\frac{1}{3} of all stake violated slashing conditions within a 36 day window, then the whole stake would be forfeit, as in the classic Casper FFG protocol. But if there were very few other slashings in the window, then almost no stake would be forfeit. This nicety does not change the Casper FFG guarantees in practice, at least since the Bellatrix upgrade to the beacon chain gave the PROPORTIONAL_SLASHING_MULTIPLIER constant its final value.

Fork choice rule

Casper FFG's fork choice rule takes the form of a modification to the underlying consensus mechanism's fork choice rule. From the Casper FFG paper, the underlying consensus mechanism must,

follow the chain containing the justified checkpoint of the greatest height.

The pure LMD GHOST protocol always begins its search for the head block from the root of the chain, the genesis block. When modified by Casper FFG's fork choice rule, LMD GHOST will start its search for the head block from the highest justified checkpoint it knows about, and will ignore potential head blocks that do not descend from the highest justified checkpoint. We will discuss this more when we get to Gasper.

This modification to the underlying consensus protocol's fork choice rule is what confers finality. When a node has justified a checkpoint in its local view, it is committed to never reverting it. Therefore, the underlying chain must always include that checkpoint; all branches that do not include that checkpoint must be ignored.

Note that this fork choice rule is compatible with the plausible liveness guarantee of Casper FFG.

The guarantees of Casper FFG

The Casper FFG consensus protocol makes two guarantees that are analogous to, but different from, the concepts safety and liveness in classical consensus: accountable safety, and plausible liveness.

Accountable safety

Classical PBFT consensus can guarantee safety only when less than one-third of validators are adversarial (faulty). If more than one-third are adversarial then it makes no promises at all.

Casper FFG comes with essentially the same safety guarantee when validators controlling less than one-third of the stake are adversarial: finalised checkpoints will never be reverted. In addition, it provides the further guarantee that, if conflicting checkpoints are ever finalised, validators representing at least one-third of the staked Ether will be slashed. This is called "accountable safety". It is accountable in that we can identify exactly which validators behaved badly and punish them directly.

The extra safety this guarantee provides is not safety in the usual consensus protocol sense of the word, but it is safety of a specifically cryptoeconomic nature: bad behaviour is heavily disincentivised by the protocol. It is often referred to as "economic finality".

Proof of accountable safety

The proof of Casper FFG's accountable safety is reasonably intuitive. I will sketch out the proof below more-or-less as it is presented in the Casper FFG paper, but modified for epochs rather than the more abstract "checkpoint height" that the paper uses.

We will prove that, if fewer than 13\frac{1}{3} of validators (weighted by stake) violate a Casper commandment, two conflicting checkpoints cannot both be finalised. We will do this by showing that the only way a conflicting checkpoint could be finalised is for there to be a supermajority link that is surrounded by another supermajority link, which contradicts the assumption at start of this paragraph, since surround votes violate the second commandment.

The first handy observation we'll need is that, under this assumption, at most one checkpoint can be justified in any epoch (in the views of honest nodes). This follows directly from the no double vote commandment.

Let us take two conflicting finalised checkpoints ama_m and bnb_n in epochs mm and nn respectively. Since the checkpoints conflict, neither is a descendent of the other. From the previous observation, we know that mnm \ne n. Without loss of generality, we will take m<nm < n, so that bnb_n is the higher finalised checkpoint.

Now, there must be a continuous series of justified checkpoints leading from the root checkpoint to bnb_n with supermajority links between them. That is, there is a set of kk supermajority links {rbi1,bi1bi2,bi2bi3,,bik1bik}\{{r \rightarrow b_{i_1}},\allowbreak {b_{i_1} \rightarrow b_{i_2}},\allowbreak {b_{i_2} \rightarrow b_{i_3}},\allowbreak \ldots,\allowbreak {b_{i_{k-1}} \rightarrow b_{i_k}}\}, with ik=ni_k = n. This follows from the definition of justification. The set of justified checkpoints leading to bnb_n is thus, B={r,bi1,bi2,bi3,,bik1,bik}\mathcal{B} = \{r, b_{i_1},\allowbreak b_{i_2}, b_{i_3},\allowbreak \ldots,\allowbreak b_{i_{k-1}}, b_{i_k}\}. We can imagine bouncing along supermajority links from the root, landing on each of these checkpoints before jumping to the next.

A diagram showing a chain of justified checkpoints joined by contiguous supermajority links from the root up to a finalised block.

For any finalised checkpoint, such as b10b_{10}, there is a contiguous chain of supermajority links leading to it from the root, rr. The chain of links here justifies the set of checkpoints B={r,b1,b4,b5,b9,b10}\mathcal{B} = \{r, b_1,\allowbreak b_4, b_5,\allowbreak b_9, b_{10}\}. Shaded checkpoints are justified (and maybe finalised); cross-hatched checkpoints are finalised (also marked with "F").

Now consider conflicting finalised checkpoint ama_m. From the definition of finalisation, there must be a supermajority link amam+1{a_m \rightarrow a_{m+1}} from ama_m to am+1a_{m+1} in the next epoch. Clearly neither ama_m nor am+1a_{m+1} is in the set B\mathcal{B}, since that would make ama_m an ancestor of bnb_n and not conflicting. Also, set B\mathcal{B} contains no checkpoint bmb_m or bm+1b_{m+1} since we may have only one justified checkpoint in an epoch.

Given these observations, it is inevitable that the pair of checkpoints (am,am+1)(a_m, a_{m+1}) falls between the epochs of two consecutive elements of B\mathcal{B}, say bij1b_{i_{j-1}} and bijb_{i_j}. That is, there is a jj such that ij1<m<m+1<iji_{j-1} < m < m+1 < i_j