Part 3: Annotated Specification
def compute_shuffled_index(index: uint64, index_count: uint64, seed: Bytes32) -> uint64: """ Return the shuffled index corresponding to ``seed`` (and ``index_count``). """ assert index < index_count # Swap or not (https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf) # See the 'generalized domain' algorithm on page 3 for current_round in range(SHUFFLE_ROUND_COUNT): pivot = bytes_to_uint64(hash(seed + uint_to_bytes(uint8(current_round)))[0:8]) % index_count flip = (pivot + index_count - index) % index_count position = max(index, flip) source = hash( seed + uint_to_bytes(uint8(current_round)) + uint_to_bytes(uint32(position // 256)) ) byte = uint8(source[(position % 256) // 8]) bit = (byte >> (position % 8)) % 2 index = flip if bit else index return index
Selecting random, distinct committees of validators is a big part of Ethereum 2.0; it is foundational for both its scalability and security. This selection is done by shuffling.
Shuffling a list of objects is a well understood problem in computer science. Notice, however, that this routine manages to shuffle a single index to a new location, knowing only the total length of the list. To use the technical term for this, it is oblivious. To shuffle the whole list, this routine needs to be called once per validator index in the list. By construction, each input index maps to a distinct output index. Thus, when applied to all indices in the list, it results in a permutation, also called a shuffling.
Why do this rather than a simpler, more efficient, conventional shuffle? It's all about light clients. Beacon nodes will generally need to know the whole shuffling, but light clients will often be interested only in a small number of committees. Using this technique allows the composition of a single committee to be calculated without having to shuffle the entire set: potentially a big saving on time and memory.
As stated in the code comments, this is an implementation of the "swap-or-not" shuffle, described in the cited paper. Vitalik kicked off a search for a shuffle with these properties in late 2018. With the help of Professor Dan Boneh of Stanford University, the swap-or-not was identified as a candidate a couple of months later, and adopted into the spec.
The algorithm breaks down as follows. For each iteration (each round), we start with a current
- Pseudo-randomly select a pivot. This is a 64-bit integer based on the seed and current round number. This domain is large enough that any non-uniformity caused by taking the modulus in the next step is entirely negligible.
pivotto find another index in the list of validators,
flip, which is
pivot - indexaccounting for wrap-around in the list.
- Calculate a single pseudo-random bit based on the seed, the current round number, and some bytes from either
flipdepending on which is greater.
- If our bit is zero, we keep
indexunchanged; if it is one, we set
We are effectively swapping cards in a deck based on a deterministic algorithm.
The way that
position is broken down is worth noting:
- Bits 0-2 (3 bits) are used to select a single bit from the eight bits of
- Bits 3-7 (5 bits) are used to select a single byte from the thirty-two bytes of
- Bits 8-39 (32 bits) are used in generating
source. Note that the upper two bytes of this will always be zero in practice, due to limits on the number of active validators.
SHUFFLE_ROUND_COUNT is, and always has been, 90 in the mainnet configuration, as explained there.
See the section on Shuffling for a more structured exposition and analysis of this algorithm (with diagrams!).
In practice, full beacon node implementations will run this once per epoch using an optimised version that shuffles the whole list, and cache the result of that for the epoch.
def compute_proposer_index(state: BeaconState, indices: Sequence[ValidatorIndex], seed: Bytes32) -> ValidatorIndex: """ Return from ``indices`` a random index sampled by effective balance. """ assert len(indices) > 0 MAX_RANDOM_BYTE = 2**8 - 1 i = uint64(0) total = uint64(len(indices)) while True: candidate_index = indices[compute_shuffled_index(i % total, total, seed)] random_byte = hash(seed + uint_to_bytes(uint64(i // 32)))[i % 32] effective_balance = state.validators[candidate_index].effective_balance if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte: return candidate_index i += 1
There is exactly one beacon block proposer per slot, selected randomly from among all the active validators. The seed parameter is set in
get_beacon_proposer_index based on the epoch and slot. Note that there is a small but finite probability of the same validator being called on to propose a block more than once in an epoch.
A validator's chance of being the proposer is weighted by its effective balance: a validator with a 32 Ether effective balance is twice as likely to be chosen as a validator with a 16 Ether effective balance.
To account for the need to weight by effective balance, this function implements as a try-and-increment algorithm. A counter
i starts at zero. This counter does double duty:
iis used to uniformly select a candidate proposer with probability where, is the number of active validators. This is done by using the
compute_shuffled_indexroutine to shuffle index
ito a new location, which is then the
iis used to generate a pseudo-random byte using the hash function as a seeded PRNG with at least 256 bits of output. The lower 5 bits of
iselect a byte in the hash function, and the upper bits salt the seed. (An obvious optimisation is that the output of the hash changes only once every 32 iterations.)
if test is where the weighting by effective balance is done. If the candidate has
MAX_EFFECTIVE_BALANCE, it will always pass this test and be returned as the proposer. If the candidate has a fraction of
MAX_EFFECTIVE_BALANCE then that fraction is the probability of being returned as proposer.
If the candidate is not chosen, then
i is incremented, and we try again. Since the minimum effective balance is half of the maximum, then this ought to terminate fairly swiftly. In the worst case, all validators have 16 Ether effective balance, so the chance of having to do another iteration is 50%, in which case there is a one in a million chance of having to do 20 iterations.
Note that this dependence on the validators' effective balances, which are updated at the end of each epoch, means that proposer assignments are valid only in the current epoch. This is different from attestation committee assignments, which are valid with a one epoch look-ahead.
def compute_committee(indices: Sequence[ValidatorIndex], seed: Bytes32, index: uint64, count: uint64) -> Sequence[ValidatorIndex]: """ Return the committee corresponding to ``indices``, ``seed``, ``index``, and committee ``count``. """ start = (len(indices) * index) // count end = (len(indices) * uint64(index + 1)) // count return [indices[compute_shuffled_index(uint64(i), uint64(len(indices)), seed)] for i in range(start, end)]
compute_committee is used by
get_beacon_committee() to find the specific members of one of the committees at a slot.
Every epoch, a fresh set of committees is generated; during an epoch, the committees are stable.
Looking at the parameters in reverse order:
countis the total number of committees in an epoch. This is
SLOTS_PER_EPOCHtimes the output of
indexis the committee number within the epoch, running from
count - 1. It is calculated in
get_beacon_committee()from the committee number in the slot
indexand the slot number as
(slot % SLOTS_PER_EPOCH) * committees_per_slot + index.
seedis the seed value for computing the pseudo-random shuffling, based on the epoch number and a domain parameter. (
indicesis the list of validators eligible for inclusion in committees, namely the whole list of indices of active validators.
Random sampling among the validators is done by taking a contiguous slice of array indices from
end and seeing where each one gets shuffled to by
compute_shuffled_index(). Note that
ValidatorIndex(i) is a type-cast in the above: it just turns
i into a ValidatorIndex type for input into the shuffling. The output value of the shuffling is then used as an index into the
indices list. There is much here that client implementations will optimise with caching and batch operations.
It may not be immediately obvious, but not all committees returned will be the same size (they can vary by one), and every validator in
indices will be a member of exactly one committee. As we increment
index from zero, clearly
index == j + 1 is
index == j, so there are no gaps. In addition, the highest
count - 1, so every validator in
indices finds its way into a committee.1
This method of selecting committees is light client friendly. Light clients can compute only the committees that they are interested in without needing to deal with the entire validator set. See the section on Shuffling for explanation of how this works.
Sync committees are assigned by a different process that is more akin to repeatedly performing
def compute_epoch_at_slot(slot: Slot) -> Epoch: """ Return the epoch number at ``slot``. """ return Epoch(slot // SLOTS_PER_EPOCH)
This is trivial enough that I won't explain it. But note that it does rely on
GENESIS_EPOCH being zero. The more pernickety among us might prefer it to read,
return GENESIS_EPOCH + Epoch((slot - GENESIS_SLOT) // SLOTS_PER_EPOCH)
def compute_start_slot_at_epoch(epoch: Epoch) -> Slot: """ Return the start slot of ``epoch``. """ return Slot(epoch * SLOTS_PER_EPOCH)
Maybe should read,
return GENESIS_SLOT + Slot((epoch - GENESIS_EPOCH) * SLOTS_PER_EPOCH))
def compute_activation_exit_epoch(epoch: Epoch) -> Epoch: """ Return the epoch during which validator activations and exits initiated in ``epoch`` take effect. """ return Epoch(epoch + 1 + MAX_SEED_LOOKAHEAD)
When queuing validators for activation or exit in
initiate_validator_exit() respectively, the activation or exit is delayed until the next epoch, plus
MAX_SEED_LOOKAHEAD epochs, currently 4.
MAX_SEED_LOOKAHEAD for the details, but in short it is designed to make it extremely hard for an attacker to manipulate the membership of committees via activations and exits.
def compute_fork_data_root(current_version: Version, genesis_validators_root: Root) -> Root: """ Return the 32-byte fork data root for the ``current_version`` and ``genesis_validators_root``. This is used primarily in signature domains to avoid collisions across forks/chains. """ return hash_tree_root(ForkData( current_version=current_version, genesis_validators_root=genesis_validators_root, ))
The fork data root serves as a unique identifier for the chain that we are on.
genesis_validators_root identifies our unique genesis event, and
current_version our own hard fork subsequent to that genesis event. This is useful, for example, to differentiate between a testnet and mainnet: both might have the same fork versions, but will definitely have different genesis validator roots.
It is used by
def compute_fork_digest(current_version: Version, genesis_validators_root: Root) -> ForkDigest: """ Return the 4-byte fork digest for the ``current_version`` and ``genesis_validators_root``. This is a digest primarily used for domain separation on the p2p layer. 4-bytes suffices for practical separation of forks/chains. """ return ForkDigest(compute_fork_data_root(current_version, genesis_validators_root)[:4])
Extracts the first four bytes of the fork data root as a
ForkDigest type. It is primarily used for domain separation on the peer-to-peer networking layer.
compute_fork_digest() is used extensively in the Ethereum 2.0 networking specification to distinguish between independent beacon chain networks or forks: it is important that activity on one chain does not interfere with other chains.
def compute_domain(domain_type: DomainType, fork_version: Version=None, genesis_validators_root: Root=None) -> Domain: """ Return the domain for the ``domain_type`` and ``fork_version``. """ if fork_version is None: fork_version = GENESIS_FORK_VERSION if genesis_validators_root is None: genesis_validators_root = Root() # all bytes zero by default fork_data_root = compute_fork_data_root(fork_version, genesis_validators_root) return Domain(domain_type + fork_data_root[:28])
When dealing with signed messages, the signature "domains" are separated according to three independent factors:
- All signatures include a
DomainTyperelevant to the message's purpose, which is just some cryptographic hygiene in case the same message is to be signed for different purposes at any point.
- All but signatures on deposit messages include the fork version. This ensures that messages across different forks of the chain become invalid, and that validators won't be slashed for signing attestations on two different chains (this is allowed).
- And, now, the root hash of the validator Merkle tree at Genesis is included. Along with the fork version this gives a unique identifier for our chain.
This function is mainly used by
get_domain(). It is also used in deposit processing, in which case
genesis_validators_root take their default values since deposits are valid across forks.
Fun fact: this function looks pretty simple, but I found a subtle bug in the way tests were generated in a previous implementation.
def compute_signing_root(ssz_object: SSZObject, domain: Domain) -> Root: """ Return the signing root for the corresponding signing data. """ return hash_tree_root(SigningData( object_root=hash_tree_root(ssz_object), domain=domain, ))
This is a pre-processor for signing objects with BLS signatures:
- calculate the hash tree root of the object;
- combine the hash tree root with the
Domaininside a temporary
- return the hash tree root of that, which is the data to be signed.
domain is usually the output of
get_domain(), which mixes in the cryptographic domain, the fork version, and the genesis validators root to the message hash. For deposits, it is the output of
compute_domain(), ignoring the fork version and genesis validators root.
This is exactly equivalent to adding the domain to an object and taking the hash tree root of the whole thing. Indeed, this function used to be called
|Used by||Many places|
Note: This function is unsafe with respect to overflows and underflows.
def compute_timestamp_at_slot(state: BeaconState, slot: Slot) -> uint64: slots_since_genesis = slot - GENESIS_SLOT return uint64(state.genesis_time + slots_since_genesis * SECONDS_PER_SLOT)
A simple utility for calculating the Unix timestamp at the start of the given slot. This is used when validating execution payloads.
This function was added in the Bellatrix pre-Merge upgrade.
- Also not immediately obvious is that there is a subtle issue with committee sizes that was discovered by formal verification, although, given the max supply of ETH it will never be triggered.↩