Part 3: Annotated Specification
Types, Constants, Presets, and Configuration
The specification defines the following Python custom types, "for type hinting and readability": the data types defined here appear frequently throughout the spec; they are the building blocks for everything else.
Each type has a name, an "SSZ equivalent", and a description. SSZ is the encoding method used to pass data between clients, among other things. Here it can be thought of as just a primitive data type.
Throughout the spec, (almost) all integers are unsigned 64 bit numbers,
uint64, but this hasn't always been the case.
Regarding "unsigned", there was much discussion around whether Eth2 should use signed or unsigned integers, and eventually unsigned was chosen. As a result, it is critical to preserve the order of operations in some places to avoid inadvertently causing underflows since negative numbers are forbidden.
And regarding "64 bit", early versions of the spec used other bit lengths than 64 (a "premature optimisation"), but arithmetic integers are now standardised at 64 bits throughout the spec, the only exception being
ParticipationFlags, introduced in the Altair fork, which has type
uint8, and is really a
||a slot number|
||an epoch number|
||a committee index at a slot|
||a validator registry index|
||an amount in Gwei|
||a Merkle root|
||a 256-bit hash|
||a fork version number|
||a domain type|
||a digest of the current fork data|
||a signature domain|
||a BLS12-381 public key|
||a BLS12-381 signature|
||a succinct representation of 8 boolean participation flags|
Time is divided into fixed length slots. Within each slot, exactly one validator is randomly selected to propose a beacon chain block. The progress of slots is the fundamental heartbeat of the beacon chain.
Sequences of slots are combined into fixed-length epochs.
Epoch boundaries are the points at which the chain can be justified and finalised (by the Casper FFG mechanism). They are also the points at which validator balances are updated, validator committees get shuffled, and validator exits, entries, and slashings are processed. That is, the main state-transition work is performed per epoch, not per slot.
Epochs have always felt like a slightly uncomfortable overlay on top of the slot-by-slot progress of the beacon chain, but necessitated by Casper FFG finality. There have been proposals to move away from epochs, and there are possible future developments that could allow us to do away with epochs entirely. But, for the time being, they remain.
Fun fact: Epochs were originally called Cycles.
Validators are organised into committees that collectively vote (make attestations) on blocks. Each committee is active at exactly one slot per epoch, but several committees are active at each slot. The
CommitteeIndex type is an index into the list of committees active at a slot.
The beacon chain's committee-based design is a large part of what makes it practical to implement while maintaining security. If all validators were active all the time, there would be an overwhelming number of messages to deal with. The random shuffling of committees make them very hard to subvert by an attacker without a supermajority of stake.
Each validator making a successful deposit is consecutively assigned a unique validator index number that is permanent, remaining even after the validator exits. It is permanent because the validator's balance is associated with its index, so the data needs to be preserved when the validator exits, at least until the balance is withdrawn at an unknown future time.
All Ether amounts are specified in units of Gwei ( Wei, Ether). This is basically a hack to avoid having to use integers wider than 64 bits to store validator balances and while doing calculations, since ( Wei is only 18 Ether. Even so, in some places care needs to be taken to avoid arithmetic overflow when dealing with Ether calculations.
Merkle roots are ubiquitous in the Eth2 protocol. They are a very succinct and tamper-proof way of representing a lot of data, an example of a cryptographic accumulator. Blocks are summarised by their Merkle roots; state is summarised by its Merkle root; the list of Eth1 deposits is summarised by its Merkle root; the digital signature of a message is calculated from the Merkle root of the data structure contained within the message.
Merkle roots are constructed with cryptographic hash functions. In the spec, a
Hash32 type is used to represent Eth1 block roots (which are also Merkle roots).
I don't know why only the Eth1 block hash has been awarded the
Hash32 type: other hashes in the spec remain
Bytes32. In early versions of the spec
Hash32 was used for all cryptographic has quantities, but this was changed to
Anyway, it's worth taking a moment in appreciation of the humble cryptographic hash function. The hash function is arguably the single most important algorithmic innovation underpinning blockchain technology, and in fact most of our online lives. Easily taken for granted, but utterly critical in enabling our modern world.
Unlike Ethereum 11, the beacon chain has an in-protocol concept of a version number. It is expected that the protocol will be updated/upgraded from time to time, a process commonly known as a "hard-fork". For example, the upgrade from Phase 0 to Altair took place on the 27th of October 2021, and was assigned its own fork version.
Version is used when computing the
DomainType is just a cryptographic nicety: messages intended for different purposes are tagged with different domains before being hashed and possibly signed. It's a kind of name-spacing to avoid clashes; probably unnecessary, but considered a best-practice. Ten domain types are defined in Altair.
ForkDigest is the unique chain identifier, generated by combining information gathered at genesis with the current chain
ForkDigest serves two purposes.
- Within the consensus protocol to prevent, for example, attestations from validators on one fork (that maybe haven't upgraded yet) being counted on a different fork.
- Within the networking protocol to help to distinguish between useful peers that on the same chain, and useless peers that are on a different chain. This usage is described in the Ethereum 2.0 networking specification, where
ForkDigest is the first four bytes of the hash tree root of the
ForkData object containing the current chain
Version and the
genesis_validators_root which was created at beacon chain initialisation. It is computed in
Domain is used when verifying protocol messages validators. To be valid, a message must have been combined with both the correct domain and the correct fork version. It calculated as the concatenation of the four byte
DomainType and the first 28 bytes of the fork data root.
BLS (Boneh-Lynn-Shacham) is the digital signature scheme used by Eth2. It has some very nice properties, in particular the ability to aggregate signatures. This means that many validators can sign the same message (for example, that they support block X), and these signatures can all be efficiently aggregated into a single signature for verification. The ability to do this efficiently makes Eth2 practical as a protocol. Several other protocols have adopted or will adopt BLS, such as Zcash, Chia, Dfinity and Algorand. We are using the BLS signature scheme based on the BLS12-381 (Barreto-Lynn-Scott) elliptic curve.
BLSPubkey type holds a validator's public key, or the aggregation of several validators' public keys. This is used to verify messages that are claimed to have come from that validator or group of validators.
In Ethereum 2.0, BLS public keys are elliptic curve points from the BLS12-381 group, thus are 48 bytes long when compressed.
See the section on BLS signatures in part 2 for a more in-depth look at these things.
As above, we are using BLS signatures over the BLS12-381 elliptic curve in order to sign messages between participants. As with all digital signature schemes, this guarantees both the identity of the sender and the integrity of the contents of any message.
In Ethereum 2.0, BLS signatures are elliptic curve points from the BLS12-381 group, thus are 96 bytes long when compressed.
ParticipationFlags type was introduced in the Altair upgrade as part of the accounting reforms.
Prior to Altair, all attestations seen in blocks were stored in state for two epochs. At the end of an epoch, finality calculations, and reward and penalty calculations for each active validator, would be done by processing all of the attestations for the previous epoch as a batch. This created a spike in processing at epoch boundaries, and led to a noticeable increase in late blocks and attestations during the first slots of epochs. With Altair, participation flags are now used to continuously track validators' attestations, reducing the processing load at the end of epochs.
Three of the eight bits are currently used; five are reserved for future use.
As an aside, it might have been more intuitive if
ParticipationFlags were a
Bytes1 type, rather than introducing a weird
uint8 into the spec. After all, it is not used as an arithmetic integer. However,
Bytes1 is a composite type in SSZ, really an alias for
Vector[uint8, 1], whereas
uint8 is a basic type. When computing the hash tree root of a
List type, multiple basic types can be packed into a single leaf, while composite types take a leaf each. This would result in 32 times as many hashing operations for a list of
Bytes1. For similar reasons the type of
ParticipationFlags was changed from