Part 3: Annotated Specification

Helper Functions

Misc

`compute_shuffled_index`

def compute_shuffled_index(index: uint64, index_count: uint64, seed: Bytes32) -> uint64:
    """
    Return the shuffled index corresponding to ``seed`` (and ``index_count``).
    """
    assert index < index_count

    # Swap or not (https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf)
    # See the 'generalized domain' algorithm on page 3
    for current_round in range(SHUFFLE_ROUND_COUNT):
        pivot = bytes_to_uint64(hash(seed + uint_to_bytes(uint8(current_round)))[0:8]) % index_count
        flip = (pivot + index_count - index) % index_count
        position = max(index, flip)
        source = hash(
            seed
            + uint_to_bytes(uint8(current_round))
            + uint_to_bytes(uint32(position // 256))
        )
        byte = uint8(source[(position % 256) // 8])
        bit = (byte >> (position % 8)) % 2
        index = flip if bit else index

    return index

Selecting random, distinct committees of validators is a big part of Ethereum 2.0; it is foundational for both its scalability and security. This selection is done by shuffling.

Shuffling a list of objects is a well understood problem in computer science. Notice, however, that this routine manages to shuffle a single index to a new location, knowing only the total length of the list. To use the technical term for this, it is oblivious. To shuffle the whole list, this routine needs to be called once per validator index in the list. By construction, each input index maps to a distinct output index. Thus, when applied to all indices in the list, it results in a permutation, also called a shuffling.

Why do this rather than a simpler, more efficient, conventional shuffle? It's all about light clients. Beacon nodes will generally need to know the whole shuffling, but light clients will often be interested only in a small number of committees. Using this technique allows the composition of a single committee to be calculated without having to shuffle the entire set: potentially a big saving on time and memory.

As stated in the code comments, this is an implementation of the "swap-or-not" shuffle, described in the cited paper. Vitalik kicked off a search for a shuffle with these properties in late 2018. With the help of Professor Dan Boneh of Stanford University, the swap-or-not was identified as a candidate a couple of months later, and adopted into the spec.

The algorithm breaks down as follows. For each iteration (each round), we start with a current index.

Pseudo-randomly select a pivot. This is a 64-bit integer based on the seed and current round number. This domain is large enough that any non-uniformity caused by taking the modulus in the next step is entirely negligible.
Use pivot to find another index in the list of validators, flip, which is pivot - index accounting for wrap-around in the list.
Calculate a single pseudo-random bit based on the seed, the current round number, and some bytes from either index or flip depending on which is greater.
If our bit is zero, we keep index unchanged; if it is one, we set index to flip.

We are effectively swapping cards in a deck based on a deterministic algorithm.

The way that position is broken down is worth noting:

Bits 0-2 (3 bits) are used to select a single bit from the eight bits of byte.
Bits 3-7 (5 bits) are used to select a single byte from the thirty-two bytes of source.
Bits 8-39 (32 bits) are used in generating source. Note that the upper two bytes of this will always be zero in practice, due to limits on the number of active validators.

SHUFFLE_ROUND_COUNT is, and always has been, 90 in the mainnet configuration, as explained there.

See the section on Shuffling for a more structured exposition and analysis of this algorithm (with diagrams!).

In practice, full beacon node implementations will run this once per epoch using an optimised version that shuffles the whole list, and cache the result of that for the epoch.


Used by	`compute_committee()`, `compute_proposer_index()`, `get_next_sync_committee_indices()`
Uses	`bytes_to_uint64()`
See also	`SHUFFLE_ROUND_COUNT`

`compute_proposer_index`

def compute_proposer_index(state: BeaconState, indices: Sequence[ValidatorIndex], seed: Bytes32) -> ValidatorIndex:
    """
    Return from ``indices`` a random index sampled by effective balance.
    """
    assert len(indices) > 0
    MAX_RANDOM_BYTE = 2**8 - 1
    i = uint64(0)
    total = uint64(len(indices))
    while True:
        candidate_index = indices[compute_shuffled_index(i % total, total, seed)]
        random_byte = hash(seed + uint_to_bytes(uint64(i // 32)))[i % 32]
        effective_balance = state.validators[candidate_index].effective_balance
        if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte:
            return candidate_index
        i += 1

There is exactly one beacon block proposer per slot, selected randomly from among all the active validators. The seed parameter is set in get_beacon_proposer_index based on the epoch and slot. Note that there is a small but finite probability of the same validator being called on to propose a block more than once in an epoch.

A validator's chance of being the proposer is weighted by its effective balance: a validator with a 32 Ether effective balance is twice as likely to be chosen as a validator with a 16 Ether effective balance.

To account for the need to weight by effective balance, this function implements as a try-and-increment algorithm. A counter i starts at zero. This counter does double duty:

First i is used to uniformly select a candidate proposer with probability $1/N$ where, $N$ is the number of active validators. This is done by using the compute_shuffled_index routine to shuffle index i to a new location, which is then the candidate_index.
Then i is used to generate a pseudo-random byte using the hash function as a seeded PRNG with at least 256 bits of output. The lower 5 bits of i select a byte in the hash function, and the upper bits salt the seed. (An obvious optimisation is that the output of the hash changes only once every 32 iterations.)

The if test is where the weighting by effective balance is done. If the candidate has MAX_EFFECTIVE_BALANCE, it will always pass this test and be returned as the proposer. If the candidate has a fraction of MAX_EFFECTIVE_BALANCE then that fraction is the probability of being returned as proposer.

If the candidate is not chosen, then i is incremented and we try again. Since the minimum effective balance is half of the maximum, then this ought to terminate fairly swiftly. In the worst case, all validators have 16 Ether effective balance and the chance of having to do another iteration is 50%, in which case there is a one in a million chance of having to do 20 iterations.

Note that this dependence on the validators' effective balances, which are updated at the end of each epoch, means that proposer assignments are valid only in the current epoch. This is different from attestation committee assignments, which are valid with a one epoch look-ahead.


Used by	`get_beacon_proposer_index()`
Uses	`compute_shuffled_index()`
See also	`MAX_EFFECTIVE_BALANCE`

`compute_committee`

def compute_committee(indices: Sequence[ValidatorIndex],
                      seed: Bytes32,
                      index: uint64,
                      count: uint64) -> Sequence[ValidatorIndex]:
    """
    Return the committee corresponding to ``indices``, ``seed``, ``index``, and committee ``count``.
    """
    start = (len(indices) * index) // count
    end = (len(indices) * uint64(index + 1)) // count
    return [indices[compute_shuffled_index(uint64(i), uint64(len(indices)), seed)] for i in range(start, end)]

compute_committee is used by get_beacon_committee() to find the specific members of one of the committees at a slot.

Every epoch, a fresh set of committees is generated; during an epoch, the committees are stable.

Looking at the parameters in reverse order:

count is the total number of committees in an epoch. This is SLOTS_PER_EPOCH times the output of get_committee_count_per_slot().
index is the committee number within the epoch, running from 0 to count - 1. It is calculated in (get_beacon_committee() from the committee number in the slot index and the slot number as (slot % SLOTS_PER_EPOCH) * committees_per_slot + index.
seed is the seed value for computing the pseudo-random shuffling, based on the epoch number and a domain parameter (get_beacon_committee() uses DOMAIN_BEACON_ATTESTER).
indices is the list of validators eligible for inclusion in committees, namely the whole list of indices of active validators.

Random sampling among the validators is done by taking a contiguous slice of array indices from start to end and seeing where each one gets shuffled to by compute_shuffled_index(). Note that ValidatorIndex(i) is a type-cast in the above: it just turns i into a ValidatorIndex type for input into the shuffling. The output value of the shuffling is then used as an index into the indices list. There is much here that client implementations will optimise with caching and batch operations.

It may not be immediately obvious, but not all committees returned will be the same size (they can vary by one), and every validator in indices will be a member of exactly one committee. As we increment index from zero, clearly start for index == j + 1 is end for index == j, so there are no gaps. In addition, the highest index is count - 1, so every validator in indices finds its way into a committee.¹

This method of selecting committees is light client friendly. Light clients can compute only the committees that they are interested in without needing to deal with the entire validator set. See the section on Shuffling for explanation of how this works.

Sync committees are assigned by a different process that is more akin to repeatedly performing compute_proposer_index().


Used by	`get_beacon_committee`
Uses	`compute_shuffled_index()`

`compute_epoch_at_slot`

def compute_epoch_at_slot(slot: Slot) -> Epoch:
    """
    Return the epoch number at ``slot``.
    """
    return Epoch(slot // SLOTS_PER_EPOCH)

This is trivial enough that I won't explain it. But note that it does rely on GENESIS_SLOT and GENESIS_EPOCH being zero. The more pernickety among us might prefer it to read,

    return GENESIS_EPOCH + Epoch((slot - GENESIS_SLOT) // SLOTS_PER_EPOCH)

`compute_start_slot_at_epoch`

def compute_start_slot_at_epoch(epoch: Epoch) -> Slot:
    """
    Return the start slot of ``epoch``.
    """
    return Slot(epoch * SLOTS_PER_EPOCH)

Maybe should read,

    return GENESIS_SLOT + Slot((epoch - GENESIS_EPOCH) * SLOTS_PER_EPOCH))


Used by	`get_block_root()`
See also	`SLOTS_PER_EPOCH`, `GENESIS_SLOT`, `GENESIS_EPOCH`

`compute_activation_exit_epoch`

def compute_activation_exit_epoch(epoch: Epoch) -> Epoch:
    """
    Return the epoch during which validator activations and exits initiated in ``epoch`` take effect.
    """
    return Epoch(epoch + 1 + MAX_SEED_LOOKAHEAD)

When queuing validators for activation or exit in process_registry_updates() and initiate_validator_exit() respectively, the activation or exit is delayed until the next epoch, plus MAX_SEED_LOOKAHEAD epochs, currently 4.

See MAX_SEED_LOOKAHEAD for the details, but in short it is designed to make it extremely hard for an attacker to manipulate the make up of committees via activations and exits.


Used by	`initiate_validator_exit()`, `process_registry_updates()`
See also	`MAX_SEED_LOOKAHEAD`

`compute_fork_data_root`

def compute_fork_data_root(current_version: Version, genesis_validators_root: Root) -> Root:
    """
    Return the 32-byte fork data root for the ``current_version`` and ``genesis_validators_root``.
    This is used primarily in signature domains to avoid collisions across forks/chains.
    """
    return hash_tree_root(ForkData(
        current_version=current_version,
        genesis_validators_root=genesis_validators_root,
    ))

The fork data root serves as a unique identifier for the chain that we are on. genesis_validators_root identifies our unique genesis event, and current_version our own hard fork subsequent to that genesis event. This is useful, for example, to differentiate between a testnet and mainnet: both might have the same fork versions, but will definitely have different genesis validator roots.

It is used by compute_fork_digest() and compute_domain().


Used by	`compute_fork_digest()`, `compute_domain()`
Uses	`hash_tree_root()`
See also	`ForkData`

`compute_fork_digest`

def compute_fork_digest(current_version: Version, genesis_validators_root: Root) -> ForkDigest:
    """
    Return the 4-byte fork digest for the ``current_version`` and ``genesis_validators_root``.
    This is a digest primarily used for domain separation on the p2p layer.
    4-bytes suffices for practical separation of forks/chains.
    """
    return ForkDigest(compute_fork_data_root(current_version, genesis_validators_root)[:4])

Extracts the first four bytes of the fork data root as a ForkDigest type. It is primarily used for domain separation on the peer-to-peer networking layer.

compute_fork_digest() is used extensively in the Ethereum 2.0 networking specification to distinguish between independent beacon chain networks or forks: it is important that activity on one chain does not interfere with other chains.


Uses	`compute_fork_data_root()`
See also	`ForkDigest`

`compute_domain`

def compute_domain(domain_type: DomainType, fork_version: Version=None, genesis_validators_root: Root=None) -> Domain:
    """
    Return the domain for the ``domain_type`` and ``fork_version``.
    """
    if fork_version is None:
        fork_version = GENESIS_FORK_VERSION
    if genesis_validators_root is None:
        genesis_validators_root = Root()  # all bytes zero by default
    fork_data_root = compute_fork_data_root(fork_version, genesis_validators_root)
    return Domain(domain_type + fork_data_root[:28])

When dealing with signed messages, the signature "domains" are separated according to three independent factors:

All signatures include a DomainType relevant to the message's purpose, which is just some cryptographic hygiene in case the same message is to be signed for different purposes at any point.
All but signatures on deposit messages include the fork version. This ensures that messages across different forks of the chain become invalid, and that validators won't be slashed for signing attestations on two different chains (this is allowed).
And, now, the root hash of the validator Merkle tree at Genesis is included. Along with the fork version this gives a unique identifier for our chain.

This function is mainly used by get_domain(). It is also used in deposit processing, in which case fork_version and genesis_validators_root take their default values since deposits are valid across forks.

Fun fact: this function looks pretty simple, but I found a subtle bug in the way tests were generated in a previous implementation.


Used by	`get_domain()`, `process_deposit()`
Uses	`compute_fork_data_root()`
See also	`Domain`, `DomainType` `GENESIS_FORK_VERSION`

`compute_signing_root`

def compute_signing_root(ssz_object: SSZObject, domain: Domain) -> Root:
    """
    Return the signing root for the corresponding signing data.
    """
    return hash_tree_root(SigningData(
        object_root=hash_tree_root(ssz_object),
        domain=domain,
    ))

This is a pre-processor for signing objects with BLS signatures:

calculate the hash tree root of the object;
combine the hash tree root with the Domain inside a temporary SigningData object;
return the hash tree root of that, which is the data to be signed.

The domain is usually the output of get_domain(), which mixes in the cryptographic domain, the fork version, and the genesis validators root to the message hash. For deposits, it is the output of compute_domain(), ignoring the fork version and genesis validators root.

This is exactly equivalent to adding the domain to an object and taking the hash tree root of the whole thing. Indeed, this function used to be called compute_domain_wrapper_root().


Used by	Many places
Uses	`hash_tree_root()`
See also	`SigningData`, `Domain`

Also not immediately obvious is that there is a subtle issue with committee sizes that was discovered by formal verification, although, given the max supply of ETH it will never be triggered.↩