Part 2: Technical Overview
- Casper FFG adds finality to the Eth2 consensus protocol.
- It acts as an overlay on LMD GHOST consensus that modifies its fork choice rule.
- Casper FFG is classically safe under asynchrony when less than of validators are faulty or adversarial.
- Moreover, slashing allows Casper FFG to provide accountable safety, also known as economic finality, when more than are adversarial.
Many articles have been written to explain the basic mechanics of Casper FFG – how it works – but there is very little about why it works. I hope that by the end of this section you will feel that you have some insight into why Casper FFG is effective.
The mechanics of Casper FFG are not very complicated. Even so, as you read on, bear in mind that the effectiveness of Casper FFG really comes down to two big ideas. First, the two-phase commit (justification and finalisation) and, second, accountable safety.
The two-phase commit gives Casper FFG classical consensus safety. It makes it possible to declare blocks final, being certain that no honest validator will ever revert them. But that's enforceable only when over two-thirds of the stake is controlled by honest validators, something we cannot always assume.
On top of this, Casper FFG offers an extra guarantee called accountable safety for situations when over one-third of the validators are dishonest. If the chain ever suffers from conflicting finality, at least one-third of the total amount of staked Ether will be burnt. This is enforced by slashing validators that break either of the two Casper commandments.
Casper FFG is a kind of meta-consensus protocol. It is an overlay that can be run on top of an underlying consensus protocol in order to add finality to it. Recall that finality is the property that there are blocks in the chain that are guaranteed never to be reverted: they will be part of the chain forever. In Ethereum's proof of stake consensus, the underlying protocol is LMD GHOST which does not provide finality - there is always a chance that validators might decide to build a competing chain, and there is no real penalty for doing so. Casper FFG functions as a "finality gadget", and we use it to add finality to LMD GHOST.
Casper FFG takes advantage of the fact that, as a proof of stake protocol, we know who our participants are: the validators that manage the staked Ether. This means that we can use vote counting to judge when we have seen a majority of the votes of honest validators. More precisely, votes from validators managing a majority of the stake - in everything that follows, every validator's vote is weighted by the value of the stake that it manages, but for simplicity we won't spell it out every time.
We operate on an asynchronous network, the Internet, which means that we can tolerate at most of the validators being adversarial (or faulty) if we want to achieve both safety and liveness. This is a well-known result in consensus theory1, and the reasoning goes as follows.
- We have a total of validators, of which a number, , may be faulty or adversarial in some way.
- To preserve liveness, we need to be able to make a decision after hearing from only validators, since the faulty ones might withhold their votes.
- But this is an asynchronous environment, so the non-responders may simply be delayed, and not faulty at all.
- Therefore we must account for up to of the responses we have received being from faulty or adversarial validators.
- To guarantee that we can always achieve a simple majority of honest validators after hearing from validators, we require that . That is, .
To summarise, like all classical Byzantine fault tolerant (BFT) protocols, Casper FFG is able to provide finality when less than one-third of the total validator set is faulty or adversarial. When sufficient honest validators have declared a block finalised, all honest validators will follow, and that block will not be reverted. However, as we shall see, unlike classical BFT protocols, Casper FFG is also able to provide economic finality (accountable safety) even when more than one-third of the validator set is faulty or adversarial.
In this section, we will consider Casper FFG on its own terms, without spending much time on how it is integrated with LMD GHOST. This is in the spirit of the Casper FFG paper which has very little to say about the underlying blockchain consensus mechanism. We will look at how the two are combined into Gasper in the next section.
Once again, the name Casper FFG comprises two parts, and it's worth looking at them both.
The Casper part of the name seems to be due to Vlad Zamfir. He explains in Part 5 of his History of Casper as follows.
In this chapter I recount the story of Casper’s birth as an application of the principles of Aviv Zohar and Jonatan Sompolinsky’s GHOST to proof-of-stake.
I called it "the friendly ghost" because of incentives designed to guarantee censorship resistance against the oligopolists: incentives that force the cartel to be friendly to non-cartel validators.
The GHOST protocol he mentions is the same as we looked at in the previous section. It helps to make sense of all this if you share the cultural context that Casper the Friendly Ghost is a cartoon character that's been around since the 1940s.
Zamfir's protocol was initially called Casper TFG (The Friendly Ghost), and later renamed Casper CBC (Correct By Construction). Vitalik's Casper FFG grew up alongside Zamfir's Casper TFG/CBC – which presumably explains the naming synergy – but has very little in common with it. Indeed, Casper FFG doesn't even use the GHOST protocol.2
The FFG part stands for the "Friendly Finality Gadget", as in the title of the Casper FFG paper.
This is clearly a play on Zamfir's TFG name, but also indicates that Casper FFG is not a fully self-contained blockchain protocol, but a "gadget" than can add finality to an underlying consensus protocol.
As ever, the jargon that we use is the way into understanding how the protocol is constructed.
In order to come to a decision about finality, the Casper FFG mechanism needs to process votes from at least of the validator set. In Ethereum, the validator set is potentially very large, and it is infeasible for votes from several hundred thousand validators to be broadcast, gossiped, and processed simultaneously.
To work around this, voting is spread out through the duration of an epoch3, which, in Eth2, is 32 slots of 12 seconds each. At each slot, of the total validator set is scheduled to broadcast a vote, so each validator is scheduled to cast a vote exactly once per epoch. For efficiency, we bundle each validator's Casper FFG vote with its LMD GHOST vote, although that's by no means necessary.
To ensure that validators voting at different times during the epoch have something in common to vote for, we make them vote for a checkpoint, which is the first slot of an epoch. The checkpoint in epoch is at slot number (remembering that slots and epochs are numbered from zero).
As an aside, people often talk lazily of finalising epochs, but that's not correct. Casper FFG finalises checkpoints, the first slots of epochs. When we have finalised the checkpoint in epoch , we have finalised everything up to and including slot . This includes all of epoch and the first slot of epoch . But we have not yet finalised epoch - it still has 31 unfinalised slots in it.
For the time-being we will assume that every slot has a block in it. This is because the original Casper FFG's checkpoints are based on block heights rather than slot numbers. We will relax this assumption to allow empty slots and checkpoints when we look at Gasper.
Within the protocol, a
Checkpoint object simply contains the epoch number of the checkpoint, and the hash tree root of the head block in the epoch's first slot (
class Checkpoint(Container): epoch: Epoch root: Root
Like classical BFT consensus protocols, Casper FFG achieves finality through a two-round process.
In the first round, I broadcast my view of the current epoch's checkpoint (, say) to rest of the network, and I hear from the rest of the network what their view is. If a supermajority tells me that they also support , that allows me to justify it. Justification is local to my network view: at this stage, I believe that the majority of the network believes that is favourable for finalisation. But I don't yet know that the rest of the network has come to the same conclusion. Under adversarial conditions, it is possible that a sufficient majority of the other validators failed to come to a decision on . We shall look at such a scenario later.
In the second round, I broadcast the fact that I've heard from a supermajority of validators that they support (that is, that I've justified ) and I hear from the rest of the network whether they believe that a supermajority of validators supports (that is, that they have justified ). If I hear that a supermajority of validators agrees with me that is justified, then I will finalise . Finalisation is a global property: once a checkpoint is finalised, I know that no honest validator will ever revert it. Even if they haven't yet marked the checkpoint as finalised in their view, I know that they've at least marked it justified, and that there is no (non-slashable) behaviour that will be able to revert that justification.
To summarise, for me to be absolutely certain that the whole network agrees that a block will not be reverted requires the following steps.4
- Round 1 (ideally leading to justification):
- I tell the network what I think is the best checkpoint.
- I hear from the network what all the other validators think is the best checkpoint.
- If I hear that of the validators agree with me, I justify the checkpoint.
- Round 2 (ideally leading to finalisation):
- I tell the network my justified checkpoint, the collective view I gained from round 1.
- I hear from the network what all the other validators think the collective view is, their justified checkpoints.
- If I hear that of the validators agree with me, I finalise the checkpoint.
In short, when I justify a checkpoint I make a commitment never to revert it. When I finalise a checkpoint I know that all honest validators are committed to never reverting it.
Under ideal conditions, each round lasts an epoch, so it takes an epoch to justify a checkpoint and a further epoch to finalise a checkpoint. At the start of epoch we are aiming to have justified checkpoint and to have finalised checkpoint .
Quantifying that, it takes 12.8 minutes, two epochs, to finalise a checkpoint in-protocol. In Casper FFG, the two rounds are overlapped and pipelined, so that, although it takes 12.8 minutes from end to end to finalise a checkpoint, we can finalise a checkpoint every 6.4 minutes, once per epoch.
Note that it can be possible from outside the protocol to see that a checkpoint is likely to be finalised a little earlier than the full 12.8 minutes, assuming that there is no long chain reorg. Specifically, it is possible to have collected enough votes by of the way through the second round, that is after about 11 minutes. However, in-protocol justification and finalisation is done only during end of epoch processing.
An aside on the nomenclature: the terms "finalised" and "justified" do not appear in the classical consensus literature. It's easy to see where "finalised" comes from, but perhaps not so for "justified", which is frankly a peculiar term to be using here. As far as I can tell, its origins are in Vlad Zamfir's Casper TFG protocol. In that work, messages contain a "justification" to support the vote being made. The paper doesn't use the word "justified" anywhere, but I suspect that's where we got it from. In Casper FFG, my "justfication" is having seen evidence that of validators like the same checkpoint as I do.
A vote in Casper FFG has two parts: a source checkpoint vote and a target checkpoint vote. These are the
target fields in an attestation's data:
class AttestationData(Container): slot: Slot index: CommitteeIndex # LMD GHOST vote beacon_block_root: Root # FFG vote source: Checkpoint target: Checkpoint
Source and target votes are made simultaneously in the form of a link , where is the source checkpoint and is the target checkpoint.
The role of my target vote is to broadcast my view of what the I think should be the next checkpoint to be justified. In the terms above, it is my round 1 vote. My target vote is a soft (conditional) commitment not to revert that checkpoint as long as I hear from of validators that they also commit to that checkpoint.
The role of my source vote is to broadcast that I've seen support from of the network for checkpoint , and that it is the most recent such checkpoint that I know about. In the terms above, it is my round 2 vote announcing the collective view that I've heard. By making this source vote I am upgrading my previous soft commitment not to revert the checkpoint to a hard (unconditional) commitment never to revert it.
An honest validator's source vote will always be the highest justified checkpoint in its view of the chain. Its target vote will be the checkpoint of the current epoch that descends from the source checkpoint. The source and the target checkpoints do not need to be consecutive; it is permissible to jump checkpoints. But when the beacon chain is running smoothly, the target vote of one epoch will be the source vote of the next epoch.
In a valid link, the source checkpoint will always be an ancestor of the target checkpoint. If it weren't so, I'd be contradicting myself: the source vote announces my commitment never to revert checkpoint ; if the target checkpoint is not descended from then that would be a vote to revert . However, it is not a slashable offence to publish such an invalid link5.
There are some specific criteria around timeliness that valid Casper FFG votes must satisfy in the Eth2 implementation. We'll discuss these more in the Gasper section as they don't apply to the abstract Casper FFG protocol that we're considering here.
When weighing Casper FFG votes, only votes that have been received in blocks are considered. Unlike with LMD GHOST's fork choice, we do not consider any Casper FFG votes received solely via gossip, by carrier pigeon, or by any other means. This is because we must always have a common record of our decision making around finality, and the block history provides that common record. So, when I said above, "I tell the network", it's shorthand for saying that I broadcast an attestation that will be picked up by a block proposer and included in a block. And when I said, "I hear from the network", it's shorthand for saying that I processed the attestations included in a block.
Also, allow me to reiterate that votes in Casper FFG are weighted according to validators' effective balances. A vote from a validator with a 24 ETH effective balance carries 75% of the weight of a vote from a validator with a 32 ETH effective balance. I will frequently say things like, "the votes of two-thirds of the validators"; you should understand this as, "the votes of validators managing two-thirds of the stake".
As described above, a link is a Casper FFG vote pair that links the source and target checkpoints, .
A link is a supermajority link when over of the validators (weighted by stake) have published the same link (and had their votes included in blocks in a timely way).
With (most of) the terminology and key concepts behind us, we can now look at how Casper FFG operates in detail. It is fairly straightforward.
When a node sees a supermajority link from justified checkpoint to checkpoint , then it considers checkpoint to be justified.
Justification means that I have seen commitments from over of the validator set that they will not revert checkpoint as long as they get confirmation from at least of validators that also will not revert .
When a node sees a supermajority link from justified checkpoint to checkpoint , and checkpoint is a direct child checkpoint of , then it considers checkpoint to be finalised.
In other words, a finalised checkpoint is a justified checkpoint whose direct child is justified.
Finalisation means that I have seen confirmation from over of the validators that they have seen commitments from over of the validators that they will not revert checkpoint . Checkpoint cannot now be reverted without at least of validators provably changing their minds, and therefore getting slashed.
The above rule for finalising a checkpoint is the one described in the original Casper FFG paper. In practice we can use a slight generalisation of this rule, without affecting the safety proof. The generalisation is called k-finality and we will look at it a little later.
In the Casper FFG paper, each checkpoint has a defined height: if is a checkpoint, then is the height of that checkpoint. Checkpoint heights increase monotonically with distance from the genesis block.
In the Eth2 implementation of Casper FFG, the checkpoint height is the checkpoint's epoch number: . Recall that a checkpoint comprises both a block hash and an epoch number. If either of these differs then the checkpoints are different.
Casper FFG's proof of accountable safety relies on any validator that violates either of the following two rules, or "commandments", being slashed.
Commandment 1: a validator must not publish distinct votes and such that .
Simply put, a validator must make at most one vote for any target epoch.
Commandment 2: a validator must not publish distinct votes and such that .
That is, a validator must not make a vote such that its link either surrounds, or is surrounded by, a previous link it voted for.
Any validator that violates either of the Casper commandments is liable to being slashed. This means that some or all of its stake is removed and it is ejected from the validator set.
Slashing underpins the accountable safety guarantee of Casper by putting a price on bad behaviour - specifically, behaviour that could lead to conflicting checkpoints being finalised. Slashing also features in LMD GHOST, and the Incentive Layer chapter has more information on the detailed mechanics of slashing.
Violations of the commandments are potentially difficult to detect in-protocol. In particular, detection of surround votes might require searching a substantial history of validators' previous votes6. For this reason, we rely on external slashing detection services7 to detect slashing condition violations and submit the evidence to block proposers. There needs to be only one such service on the network, as long as it is reliable. In practice we have more, but it is certainly not necessary for every node operator to run a slashing detector.
Once evidence of breaking one of the commandments has been found, it is easy to prove on chain that the validator broke the rules. Validators sign every attestation they publish, so, given two conflicting attestations, it is a simple matter to verify their signatures and show that the validator broke the rules in publishing them.
The protocol as it is presented in the Casper FFG paper assumes that, on proof of having violated a slashing condition, "the validator's entire deposit is taken away". We have implemented a variation of this for Eth2 that scales the proportion of a validator's stake that is forfeit in proportion to total amount of stake slashed in a given period. If of all stake violated slashing conditions within a 36 day window, then the whole stake would be forfeit, as in the classic Casper FFG protocol. But if there were very few other slashings in the window, then almost no stake would be forfeit. This nicety does not change the Casper FFG guarantees in practice, at least since the Bellatrix upgrade to the beacon chain gave the
PROPORTIONAL_SLASHING_MULTIPLIER constant its final value.
Casper FFG's fork choice rule takes the form of a modification to the underlying consensus mechanism's fork choice rule. From the Casper FFG paper, the underlying consensus mechanism must,
follow the chain containing the justified checkpoint of the greatest height.
The pure LMD GHOST protocol always begins its search for the head block from the root of the chain, the genesis block. When modified by Casper FFG's fork choice rule, LMD GHOST will start its search for the head block from the highest justified checkpoint it knows about, and will ignore potential head blocks that do not descend from the highest justified checkpoint. We will discuss this more when we get to Gasper.
This modification to the underlying consensus protocol's fork choice rule is what confers finality. When a node has justified a checkpoint in its local view, it is committed to never reverting it. Therefore, the underlying chain must always include that checkpoint; all branches that do not include that checkpoint must be ignored.
Note that this fork choice rule is compatible with the plausible liveness guarantee of Casper FFG.
The Casper FFG consensus protocol makes two guarantees that are analogous to, but different from, the concepts safety and liveness in classical consensus: accountable safety, and plausible liveness.
Classical PBFT consensus can guarantee safety only when less than one-third of validators are adversarial (faulty). If more than one-third are adversarial then it makes no promises at all.
Casper FFG comes with essentially the same safety guarantee when validators controlling less than one-third of the stake are adversarial: finalised checkpoints will never be reverted. In addition, it provides the further guarantee that, if conflicting checkpoints are ever finalised, validators representing at least one-third of the staked Ether will be slashed. This is called "accountable safety". It is accountable in that we can identify exactly which validators behaved badly and punish them directly.
The extra safety this guarantee provides is not safety in the usual consensus protocol sense of the word, but it is safety of a specifically cryptoeconomic nature: bad behaviour is heavily disincentivised by the protocol. It is often referred to as "economic finality".
The proof of Casper FFG's accountable safety is reasonably intuitive. I will sketch out the proof below more-or-less as it is presented in the Casper FFG paper, but modified for epochs rather than the more abstract "checkpoint height" that the paper uses.
We will prove that, if fewer than of validators (weighted by stake) violate a Casper commandment, two conflicting checkpoints cannot both be finalised. We will do this by showing that the only way a conflicting checkpoint could be finalised is for there to be a supermajority link that is surrounded by another supermajority link, which contradicts the assumption at start of this paragraph, since surround votes violate the second commandment.
The first handy observation we'll need is that, under this assumption, at most one checkpoint can be justified in any epoch (in the views of honest nodes). This follows directly from the no double vote commandment.
Let us take two conflicting finalised checkpoints and in epochs and respectively. Since the checkpoints conflict, neither is a descendent of the other. From the previous observation, we know that . Without loss of generality, we will take , so that is the higher finalised checkpoint.
Now, there must be a continuous series of justified checkpoints leading from the root checkpoint to with supermajority links between them. That is, there is a set of supermajority links , with . This follows from the definition of justification. The set of justified checkpoints leading to is thus, . We can imagine bouncing along supermajority links from the root, landing on each of these checkpoints before jumping to the next.
Now consider conflicting finalised checkpoint . From the definition of finalisation, there must be a supermajority link from to in the next epoch. Clearly neither nor is in the set , since that would make an ancestor of and not conflicting. Also, set contains no checkpoint or since we may have only one justified checkpoint in an epoch.
Given these observations, it is inevitable that the pair of checkpoints falls between the epochs of two consecutive elements of , say and . That is, there is a such that