# Part 3: Annotated Specification

## Helper Functions

### Misc

`compute_shuffled_index`

```
def compute_shuffled_index(index: uint64, index_count: uint64, seed: Bytes32) -> uint64:
"""
Return the shuffled index corresponding to ``seed`` (and ``index_count``).
"""
assert index < index_count
# Swap or not (https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf)
# See the 'generalized domain' algorithm on page 3
for current_round in range(SHUFFLE_ROUND_COUNT):
pivot = bytes_to_uint64(hash(seed + uint_to_bytes(uint8(current_round)))[0:8]) % index_count
flip = (pivot + index_count - index) % index_count
position = max(index, flip)
source = hash(
seed
+ uint_to_bytes(uint8(current_round))
+ uint_to_bytes(uint32(position // 256))
)
byte = uint8(source[(position % 256) // 8])
bit = (byte >> (position % 8)) % 2
index = flip if bit else index
return index
```

Selecting random, distinct committees of validators is a big part of Ethereum 2.0; it is foundational for both its scalability and security. This selection is done by shuffling.

Shuffling a list of objects is a well understood problem in computer science. Notice, however, that this routine manages to shuffle a *single index* to a new location, knowing only the total length of the list. To use the technical term for this, it is *oblivious*. To shuffle the whole list, this routine needs to be called once per validator index in the list. By construction, each input index maps to a distinct output index. Thus, when applied to all indices in the list, it results in a permutation, also called a shuffling.

Why do this rather than a simpler, more efficient, conventional shuffle? It's all about light clients. Beacon nodes will generally need to know the whole shuffling, but light clients will often be interested only in a small number of committees. Using this technique allows the composition of a single committee to be calculated without having to shuffle the entire set: potentially a big saving on time and memory.

As stated in the code comments, this is an implementation of the "swap-or-not" shuffle, described in the cited paper. Vitalik kicked off a search for a shuffle with these properties in late 2018. With the help of Professor Dan Boneh of Stanford University, the swap-or-not was identified as a candidate a couple of months later, and adopted into the spec.

The algorithm breaks down as follows. For each iteration (each round), we start with a current `index`

.

- Pseudo-randomly select a pivot. This is a 64-bit integer based on the seed and current round number. This domain is large enough that any non-uniformity caused by taking the modulus in the next step is entirely negligible.
- Use
`pivot`

to find another index in the list of validators,`flip`

, which is`pivot - index`

accounting for wrap-around in the list. - Calculate a single pseudo-random bit based on the seed, the current round number, and some bytes from either
`index`

or`flip`

depending on which is greater. - If our bit is zero, we keep
`index`

unchanged; if it is one, we set`index`

to`flip`

.

We are effectively swapping cards in a deck based on a deterministic algorithm.

The way that `position`

is broken down is worth noting:

- Bits 0-2 (3 bits) are used to select a single bit from the eight bits of
`byte`

. - Bits 3-7 (5 bits) are used to select a single byte from the thirty-two bytes of
`source`

. - Bits 8-39 (32 bits) are used in generating
`source`

. Note that the upper two bytes of this will always be zero in practice, due to limits on the number of active validators.

`SHUFFLE_ROUND_COUNT`

is, and always has been, 90 in the mainnet configuration, as explained there.

See the section on Shuffling for a more structured exposition and analysis of this algorithm (with diagrams!).

In practice, full beacon node implementations will run this once per epoch using an optimised version that shuffles the whole list, and cache the result of that for the epoch.

Used by | `compute_committee()` , `compute_proposer_index()` , `get_next_sync_committee_indices()` |

Uses | `bytes_to_uint64()` |

See also | `SHUFFLE_ROUND_COUNT` |

`compute_proposer_index`

```
def compute_proposer_index(state: BeaconState, indices: Sequence[ValidatorIndex], seed: Bytes32) -> ValidatorIndex:
"""
Return from ``indices`` a random index sampled by effective balance.
"""
assert len(indices) > 0
MAX_RANDOM_BYTE = 2**8 - 1
i = uint64(0)
total = uint64(len(indices))
while True:
candidate_index = indices[compute_shuffled_index(i % total, total, seed)]
random_byte = hash(seed + uint_to_bytes(uint64(i // 32)))[i % 32]
effective_balance = state.validators[candidate_index].effective_balance
if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte:
return candidate_index
i += 1
```

There is exactly one beacon block proposer per slot, selected randomly from among all the active validators. The seed parameter is set in `get_beacon_proposer_index`

based on the epoch and slot. Note that there is a small but finite probability of the same validator being called on to propose a block more than once in an epoch.

A validator's chance of being the proposer is weighted by its effective balance: a validator with a 32 Ether effective balance is twice as likely to be chosen as a validator with a 16 Ether effective balance.

To account for the need to weight by effective balance, this function implements as a try-and-increment algorithm. A counter `i`

starts at zero. This counter does double duty:

- First
`i`

is used to uniformly select a candidate proposer with probability $1/N$ where, $N$ is the number of active validators. This is done by using the`compute_shuffled_index`

routine to shuffle index`i`

to a new location, which is then the`candidate_index`

. - Then
`i`

is used to generate a pseudo-random byte using the hash function as a seeded PRNG with at least 256 bits of output. The lower 5 bits of`i`

select a byte in the hash function, and the upper bits salt the seed. (An obvious optimisation is that the output of the hash changes only once every 32 iterations.)

The `if`

test is where the weighting by effective balance is done. If the candidate has `MAX_EFFECTIVE_BALANCE`

, it will always pass this test and be returned as the proposer. If the candidate has a fraction of `MAX_EFFECTIVE_BALANCE`

then that fraction is the probability of being returned as proposer.

If the candidate is not chosen, then `i`

is incremented and we try again. Since the minimum effective balance is half of the maximum, then this ought to terminate fairly swiftly. In the worst case, all validators have 16 Ether effective balance and the chance of having to do another iteration is 50%, in which case there is a one in a million chance of having to do 20 iterations.

Note that this dependence on the validators' effective balances, which are updated at the end of each epoch, means that proposer assignments are valid only in the current epoch. This is different from attestation committee assignments, which are valid with a one epoch look-ahead.

Used by | `get_beacon_proposer_index()` |

Uses | `compute_shuffled_index()` |

See also | `MAX_EFFECTIVE_BALANCE` |

`compute_committee`

```
def compute_committee(indices: Sequence[ValidatorIndex],
seed: Bytes32,
index: uint64,
count: uint64) -> Sequence[ValidatorIndex]:
"""
Return the committee corresponding to ``indices``, ``seed``, ``index``, and committee ``count``.
"""
start = (len(indices) * index) // count
end = (len(indices) * uint64(index + 1)) // count
return [indices[compute_shuffled_index(uint64(i), uint64(len(indices)), seed)] for i in range(start, end)]
```

`compute_committee`

is used by `get_beacon_committee()`

to find the specific members of one of the committees at a slot.

Every epoch, a fresh set of committees is generated; during an epoch, the committees are stable.

Looking at the parameters in reverse order:

`count`

is the total number of committees in an epoch. This is`SLOTS_PER_EPOCH`

times the output of`get_committee_count_per_slot()`

.`index`

is the committee number within the epoch, running from`0`

to`count - 1`

. It is calculated in (`get_beacon_committee()`

from the committee number in the slot`index`

and the slot number as`(slot % SLOTS_PER_EPOCH) * committees_per_slot + index`

.`seed`

is the seed value for computing the pseudo-random shuffling, based on the epoch number and a domain parameter (`get_beacon_committee()`

uses`DOMAIN_BEACON_ATTESTER`

).`indices`

is the list of validators eligible for inclusion in committees, namely the whole list of indices of active validators.

Random sampling among the validators is done by taking a contiguous slice of array indices from `start`

to `end`

and seeing where each one gets shuffled to by `compute_shuffled_index()`

. Note that `ValidatorIndex(i)`

is a type-cast in the above: it just turns `i`

into a ValidatorIndex type for input into the shuffling. The output value of the shuffling is then used as an index into the `indices`

list. There is much here that client implementations will optimise with caching and batch operations.

It may not be immediately obvious, but not all committees returned will be the same size (they can vary by one), and every validator in `indices`

will be a member of exactly one committee. As we increment `index`

from zero, clearly `start`

for `index == j + 1`

is `end`

for `index == j`

, so there are no gaps. In addition, the highest `index`

is `count - 1`

, so every validator in `indices`

finds its way into a committee.^{1}

This method of selecting committees is light client friendly. Light clients can compute only the committees that they are interested in without needing to deal with the entire validator set. See the section on Shuffling for explanation of how this works.

Sync committees are assigned by a different process that is more akin to repeatedly performing `compute_proposer_index()`

.

Used by | `get_beacon_committee` |

Uses | `compute_shuffled_index()` |

`compute_epoch_at_slot`

```
def compute_epoch_at_slot(slot: Slot) -> Epoch:
"""
Return the epoch number at ``slot``.
"""
return Epoch(slot // SLOTS_PER_EPOCH)
```

This is trivial enough that I won't explain it. But note that it does rely on `GENESIS_SLOT`

and `GENESIS_EPOCH`

being zero. The more pernickety among us might prefer it to read,

` return GENESIS_EPOCH + Epoch((slot - GENESIS_SLOT) // SLOTS_PER_EPOCH)`

`compute_start_slot_at_epoch`

```
def compute_start_slot_at_epoch(epoch: Epoch) -> Slot:
"""
Return the start slot of ``epoch``.
"""
return Slot(epoch * SLOTS_PER_EPOCH)
```

Maybe should read,

` return GENESIS_SLOT + Slot((epoch - GENESIS_EPOCH) * SLOTS_PER_EPOCH))`

Used by | `get_block_root()` |

See also | `SLOTS_PER_EPOCH` , `GENESIS_SLOT` , `GENESIS_EPOCH` |

`compute_activation_exit_epoch`

```
def compute_activation_exit_epoch(epoch: Epoch) -> Epoch:
"""
Return the epoch during which validator activations and exits initiated in ``epoch`` take effect.
"""
return Epoch(epoch + 1 + MAX_SEED_LOOKAHEAD)
```

When queuing validators for activation or exit in `process_registry_updates()`

and `initiate_validator_exit()`

respectively, the activation or exit is delayed until the next epoch, plus `MAX_SEED_LOOKAHEAD`

epochs, currently 4.

See `MAX_SEED_LOOKAHEAD`

for the details, but in short it is designed to make it extremely hard for an attacker to manipulate the make up of committees via activations and exits.

Used by | `initiate_validator_exit()` , `process_registry_updates()` |

See also | `MAX_SEED_LOOKAHEAD` |

`compute_fork_data_root`

```
def compute_fork_data_root(current_version: Version, genesis_validators_root: Root) -> Root:
"""
Return the 32-byte fork data root for the ``current_version`` and ``genesis_validators_root``.
This is used primarily in signature domains to avoid collisions across forks/chains.
"""
return hash_tree_root(ForkData(
current_version=current_version,
genesis_validators_root=genesis_validators_root,
))
```

The fork data root serves as a unique identifier for the chain that we are on. `genesis_validators_root`

identifies our unique genesis event, and `current_version`

our own hard fork subsequent to that genesis event. This is useful, for example, to differentiate between a testnet and mainnet: both might have the same fork versions, but will definitely have different genesis validator roots.

It is used by `compute_fork_digest()`

and `compute_domain()`

.

Used by | `compute_fork_digest()` , `compute_domain()` |

Uses | `hash_tree_root()` |

See also | `ForkData` |

`compute_fork_digest`

```
def compute_fork_digest(current_version: Version, genesis_validators_root: Root) -> ForkDigest:
"""
Return the 4-byte fork digest for the ``current_version`` and ``genesis_validators_root``.
This is a digest primarily used for domain separation on the p2p layer.
4-bytes suffices for practical separation of forks/chains.
"""
return ForkDigest(compute_fork_data_root(current_version, genesis_validators_root)[:4])
```

Extracts the first four bytes of the fork data root as a `ForkDigest`

type. It is primarily used for domain separation on the peer-to-peer networking layer.

`compute_fork_digest()`

is used extensively in the Ethereum 2.0 networking specification to distinguish between independent beacon chain networks or forks: it is important that activity on one chain does not interfere with other chains.

Uses | `compute_fork_data_root()` |

See also | `ForkDigest` |

`compute_domain`

```
def compute_domain(domain_type: DomainType, fork_version: Version=None, genesis_validators_root: Root=None) -> Domain:
"""
Return the domain for the ``domain_type`` and ``fork_version``.
"""
if fork_version is None:
fork_version = GENESIS_FORK_VERSION
if genesis_validators_root is None:
genesis_validators_root = Root() # all bytes zero by default
fork_data_root = compute_fork_data_root(fork_version, genesis_validators_root)
return Domain(domain_type + fork_data_root[:28])
```

When dealing with signed messages, the signature "domains" are separated according to three independent factors:

- All signatures include a
`DomainType`

relevant to the message's purpose, which is just some cryptographic hygiene in case the same message is to be signed for different purposes at any point. - All but signatures on deposit messages include the fork version. This ensures that messages across different forks of the chain become invalid, and that validators won't be slashed for signing attestations on two different chains (this is allowed).
- And, now, the root hash of the validator Merkle tree at Genesis is included. Along with the fork version this gives a unique identifier for our chain.

This function is mainly used by `get_domain()`

. It is also used in deposit processing, in which case `fork_version`

and `genesis_validators_root`

take their default values since deposits are valid across forks.

Fun fact: this function looks pretty simple, but I found a subtle bug in the way tests were generated in a previous implementation.

Used by | `get_domain()` , `process_deposit()` |

Uses | `compute_fork_data_root()` |

See also | `Domain` , `DomainType` `GENESIS_FORK_VERSION` |

`compute_signing_root`

```
def compute_signing_root(ssz_object: SSZObject, domain: Domain) -> Root:
"""
Return the signing root for the corresponding signing data.
"""
return hash_tree_root(SigningData(
object_root=hash_tree_root(ssz_object),
domain=domain,
))
```

This is a pre-processor for signing objects with BLS signatures:

- calculate the hash tree root of the object;
- combine the hash tree root with the
`Domain`

inside a temporary`SigningData`

object; - return the hash tree root of that, which is the data to be signed.

The `domain`

is usually the output of `get_domain()`

, which mixes in the cryptographic domain, the fork version, and the genesis validators root to the message hash. For deposits, it is the output of `compute_domain()`

, ignoring the fork version and genesis validators root.

This is exactly equivalent to adding the domain to an object and taking the hash tree root of the whole thing. Indeed, this function used to be called `compute_domain_wrapper_root()`

.

Used by | Many places |

Uses | `hash_tree_root()` |

See also | `SigningData` , `Domain` |

- Also not immediately obvious is that there is a subtle issue with committee sizes that was discovered by formal verification, although, given the max supply of ETH it will never be triggered.↩