Part 3: Annotated Specification

Types, Constants, Presets, and Configuration

Custom Types

The specification defines the following Python custom types, "for type hinting and readability": the data types defined here appear frequently throughout the spec; they are the building blocks for everything else.

Each type has a name, an "SSZ equivalent", and a description. SSZ is the encoding method used to pass data between clients, among other things. Here it can be thought of as just a primitive data type.

Throughout the spec, (almost) all integers are unsigned 64 bit numbers, uint64, but this hasn't always been the case.

Regarding "unsigned", there was much discussion around whether Eth2 should use signed or unsigned integers, and eventually unsigned was chosen. As a result, it is critical to preserve the order of operations in some places to avoid inadvertently causing underflows since negative numbers are forbidden.

And regarding "64 bit", early versions of the spec used other bit lengths than 64 (a "premature optimisation"), but arithmetic integers are now standardised at 64 bits throughout the spec, the only exception being ParticipationFlags, introduced in the Altair fork, which has type uint8, and is really a byte type.

Name SSZ equivalent Description
Slot uint64 a slot number
Epoch uint64 an epoch number
CommitteeIndex uint64 a committee index at a slot
ValidatorIndex uint64 a validator registry index
Gwei uint64 an amount in Gwei
Root Bytes32 a Merkle root
Hash32 Bytes32 a 256-bit hash
Version Bytes4 a fork version number
DomainType Bytes4 a domain type
ForkDigest Bytes4 a digest of the current fork data
Domain Bytes32 a signature domain
BLSPubkey Bytes48 a BLS12-381 public key
BLSSignature Bytes96 a BLS12-381 signature
ParticipationFlags uint8 a succinct representation of 8 boolean participation flags

Slot

Time is divided into fixed length slots. Within each slot, exactly one validator is randomly selected to propose a beacon chain block. The progress of slots is the fundamental heartbeat of the beacon chain.

Epoch

Sequences of slots are combined into fixed-length epochs.

Epoch boundaries are the points at which the chain can be justified and finalised (by the Casper FFG mechanism). They are also the points at which validator balances are updated, validator committees get shuffled, and validator exits, entries, and slashings are processed. That is, the main state-transition work is performed per epoch, not per slot.

Epochs have always felt like a slightly uncomfortable overlay on top of the slot-by-slot progress of the beacon chain, but necessitated by Casper FFG finality. There have been proposals to move away from epochs, and there are possible future developments that could allow us to do away with epochs entirely. But, for the time being, they remain.

Fun fact: Epochs were originally called Cycles.

CommitteeIndex

Validators are organised into committees that collectively vote (make attestations) on blocks. Each committee is active at exactly one slot per epoch, but several committees are active at each slot. The CommitteeIndex type is an index into the list of committees active at a slot.

The beacon chain's committee-based design is a large part of what makes it practical to implement while maintaining security. If all validators were active all the time, there would be an overwhelming number of messages to deal with. The random shuffling of committees make them very hard to subvert by an attacker without a supermajority of stake.

ValidatorIndex

Each validator making a successful deposit is consecutively assigned a unique validator index number that is permanent, remaining even after the validator exits. It is permanent because the validator's balance is associated with its index, so the data needs to be preserved when the validator exits, at least until the balance is withdrawn at an unknown future time.

Gwei

All Ether amounts are specified in units of Gwei (10910^9 Wei, 10910^{-9} Ether). This is basically a hack to avoid having to use integers wider than 64 bits to store validator balances and while doing calculations, since (2642^{64} Wei is only 18 Ether. Even so, in some places care needs to be taken to avoid arithmetic overflow when dealing with Ether calculations.

Root

Merkle roots are ubiquitous in the Eth2 protocol. They are a very succinct and tamper-proof way of representing a lot of data, an example of a cryptographic accumulator. Blocks are summarised by their Merkle roots; state is summarised by its Merkle root; the list of Eth1 deposits is summarised by its Merkle root; the digital signature of a message is calculated from the Merkle root of the data structure contained within the message.

Hash32

Merkle roots are constructed with cryptographic hash functions. In the spec, a Hash32 type is used to represent Eth1 block roots (which are also Merkle roots).

I don't know why only the Eth1 block hash has been awarded the Hash32 type: other hashes in the spec remain Bytes32. In early versions of the spec Hash32 was used for all cryptographic has quantities, but this was changed to Bytes32.

Anyway, it's worth taking a moment in appreciation of the humble cryptographic hash function. The hash function is arguably the single most important algorithmic innovation underpinning blockchain technology, and in fact most of our online lives. Easily taken for granted, but utterly critical in enabling our modern world.

Version

Unlike Ethereum 11, the beacon chain has an in-protocol concept of a version number. It is expected that the protocol will be updated/upgraded from time to time, a process commonly known as a "hard-fork". For example, the upgrade from Phase 0 to Altair took place on the 27th of October 2021, and was assigned its own fork version.

Version is used when computing the ForkDigest.

DomainType

DomainType is just a cryptographic nicety: messages intended for different purposes are tagged with different domains before being hashed and possibly signed. It's a kind of name-spacing to avoid clashes; probably unnecessary, but considered a best-practice. Ten domain types are defined in Altair.

ForkDigest

ForkDigest is the unique chain identifier, generated by combining information gathered at genesis with the current chain Version identifier.

The ForkDigest serves two purposes.

  1. Within the consensus protocol to prevent, for example, attestations from validators on one fork (that maybe haven't upgraded yet) being counted on a different fork.
  2. Within the networking protocol to help to distinguish between useful peers that on the same chain, and useless peers that are on a different chain. This usage is described in the Ethereum 2.0 networking specification, where ForkDigest appears frequently.

Specifically, ForkDigest is the first four bytes of the hash tree root of the ForkData object containing the current chain Version and the genesis_validators_root which was created at beacon chain initialisation. It is computed in compute_fork_digest().

Domain

Domain is used when verifying protocol messages validators. To be valid, a message must have been combined with both the correct domain and the correct fork version. It calculated as the concatenation of the four byte DomainType and the first 28 bytes of the fork data root.

BLSPubkey

BLS (Boneh-Lynn-Shacham) is the digital signature scheme used by Eth2. It has some very nice properties, in particular the ability to aggregate signatures. This means that many validators can sign the same message (for example, that they support block X), and these signatures can all be efficiently aggregated into a single signature for verification. The ability to do this efficiently makes Eth2 practical as a protocol. Several other protocols have adopted or will adopt BLS, such as Zcash, Chia, Dfinity and Algorand. We are using the BLS signature scheme based on the BLS12-381 (Barreto-Lynn-Scott) elliptic curve.

The BLSPubkey type holds a validator's public key, or the aggregation of several validators' public keys. This is used to verify messages that are claimed to have come from that validator or group of validators.

In Ethereum 2.0, BLS public keys are elliptic curve points from the BLS12-381 G1G_1 group, thus are 48 bytes long when compressed.

See the section on BLS signatures in part 2 for a more in-depth look at these things.

BLSSignature

As above, we are using BLS signatures over the BLS12-381 elliptic curve in order to sign messages between participants. As with all digital signature schemes, this guarantees both the identity of the sender and the integrity of the contents of any message.

In Ethereum 2.0, BLS signatures are elliptic curve points from the BLS12-381 G2G_2 group, thus are 96 bytes long when compressed.

ParticipationFlags

The ParticipationFlags type was introduced in the Altair upgrade as part of the accounting reforms.

Prior to Altair, all attestations seen in blocks were stored in state for two epochs. At the end of an epoch, finality calculations, and reward and penalty calculations for each active validator, would be done by processing all of the attestations for the previous epoch as a batch. This created a spike in processing at epoch boundaries, and led to a noticeable increase in late blocks and attestations during the first slots of epochs. With Altair, participation flags are now used to continuously track validators' attestations, reducing the processing load at the end of epochs.

Three of the eight bits are currently used; five are reserved for future use.

As an aside, it might have been more intuitive if ParticipationFlags were a Bytes1 type, rather than introducing a weird uint8 into the spec. After all, it is not used as an arithmetic integer. However, Bytes1 is a composite type in SSZ, really an alias for Vector[uint8, 1], whereas uint8 is a basic type. When computing the hash tree root of a List type, multiple basic types can be packed into a single leaf, while composite types take a leaf each. This would result in 32 times as many hashing operations for a list of Bytes1. For similar reasons the type of ParticipationFlags was changed from bitlist to uint8.

References


  1. Ethereum 1.0 introduced a fork identifier as defined in EIP-2124 which is similar to Version, but the Eth1 fork id is not part of the consensus protocol and is used only in the networking protocol.

Created by Ben Edgington. Licensed under CC BY-SA 4.0. Published 2022-05-12 12:26 UTC. Commit 0cc9f0b.