Part 3: Annotated Specification

Helper Functions



def hash(data: bytes) -> Bytes32 is SHA256.

SHA256 was chosen as the protocol's base hash algorithm for easier cross-chain interoperability: many other chains use SHA256, and Eth1 has a SHA256 precompile.

There was a lot of discussion about this choice early in the design process. The original plan had been to use the BLAKE2b-512 hash function – that being a modern hash function that's faster than SHA3 – and to move to a STARK/SNARK friendly hash function at some point (such as MiMC). However, to keep interoperability with Eth1, in particular for the implementation of the deposit contract, the hash function was changed to Keccak256. Finally, we settled on SHA256 as having even broader compatibility.

The hash function serves two purposes within the protocol. The main use, computationally, is in Merkleization, the computation of hash tree roots, which is ubiquitous in the protocol. Its other use is to harden the randomness used in various places.

Used by hash_tree_root, is_valid_merkle_branch(), compute_shuffled_index(), compute_proposer_index(), get_seed(), get_beacon_proposer_index(), get_next_sync_committee_indices(), process_randao()


def hash_tree_root(object: SSZSerializable) -> Root is a function for hashing objects into a single root by utilizing a hash tree structure, as defined in the SSZ spec.

The development of the tree hashing process was transformational for the Ethereum 2.0 specification, and it is now used everywhere.

The naive way to create a digest of a data structure is to serialise it and then just run a hash function over the result. In tree hashing, the basic idea is to treat each element of an ordered, compound data structure as the leaf of a Merkle tree, recursively if necessary until a primitive type is reached, and to return the Merkle root of the resulting tree.

At first sight, this all looks quite inefficient. Twice as much data needs to be hashed when tree hashing, and actual speeds are 4-6 times slower compared with the linear hash. However, it is good for supporting light clients, because it allows Merkle proofs to be constructed easily for subsets of the full state.

The breakthrough insight was realising that much of the re-hashing work can be cached: if part of the state data structure has not changed, that part does not need to be re-hashed: the whole subtree can be replaced with its cached hash. This turns out to be a huge efficiency boost, allowing the previous design, with cumbersome separate crystallised and active state, to be simplified into a single state object.

Merkleization, the process of calculating the hash_tree_root() of an object, is defined in the SSZ specification, and explained further in the section on SSZ.

BLS signatures

See the main write-up on BLS Signatures for a more in-depth exploration of this topic.

The IETF BLS signature draft standard v4 with ciphersuite BLS_SIG_BLS12381G2_XMD:SHA-256_SSWU_RO_POP_ defines the following functions:

  • def Sign(privkey: int, message: Bytes) -> BLSSignature
  • def Verify(pubkey: BLSPubkey, message: Bytes, signature: BLSSignature) -> bool
  • def Aggregate(signatures: Sequence[BLSSignature]) -> BLSSignature
  • def FastAggregateVerify(pubkeys: Sequence[BLSPubkey], message: Bytes, signature: BLSSignature) -> bool
  • def AggregateVerify(pubkeys: Sequence[BLSPubkey], messages: Sequence[Bytes], signature: BLSSignature) -> bool
  • def KeyValidate(pubkey: BLSPubkey) -> bool

The above functions are accessed through the bls module, e.g. bls.Verify.

The detailed specification of the cryptographic functions underlying Ethereum 2.0's BLS signing scheme is delegated to the draft IRTF standard1 as described in the spec. This includes specifying the elliptic curve BLS12-381 as our domain of choice.

Our intention in conforming to the in-progress standard is to provide for maximal interoperability with other chains, applications, and cryptographic libraries. Ethereum Foundation researchers and Eth2 developers had input to the development of the standard. Nevertheless, there were some challenges involved in trying to keep up as the standard evolved. For example, the Hashing to Elliptic Curves standard was still changing rather late in the beacon chain testing phase. In the end, everything worked out fine.

The following two functions are described in the separate BLS Extensions document, but included here for convenience.


def eth_aggregate_pubkeys(pubkeys: Sequence[BLSPubkey]) -> BLSPubkey:
    Return the aggregate public key for the public keys in ``pubkeys``.

    NOTE: the ``+`` operation should be interpreted as elliptic curve point addition, which takes as input
    elliptic curve points that must be decoded from the input ``BLSPubkey``s.
    This implementation is for demonstrative purposes only and ignores encoding/decoding concerns.
    Refer to the BLS signature draft standard for more information.
    assert len(pubkeys) > 0
    # Ensure that the given inputs are valid pubkeys
    assert all(bls.KeyValidate(pubkey) for pubkey in pubkeys)

    result = copy(pubkeys[0])
    for pubkey in pubkeys[1:]:
        result += pubkey
    return result

Stand-alone aggregation of public keys is not defined by the BLS signature standard. In the standard, public keys are aggregated only in the context of performing an aggregate signature verification via AggregateVerify() or FastAggregateVerify().

The eth_aggregate_pubkeys() function was added in the Altair upgrade to implement an optimisation for light clients when verifying the signatures on SyncAggregates.

Used by get_next_sync_committee()
Uses bls.KeyValidate()


def eth_fast_aggregate_verify(pubkeys: Sequence[BLSPubkey], message: Bytes32, signature: BLSSignature) -> bool:
    Wrapper to ``bls.FastAggregateVerify`` accepting the ``G2_POINT_AT_INFINITY`` signature when ``pubkeys`` is empty.
    if len(pubkeys) == 0 and signature == G2_POINT_AT_INFINITY:
        return True
    return bls.FastAggregateVerify(pubkeys, message, signature)

The specification of FastAggregateVerify() in the BLS signature standard returns INVALID if there are zero public keys given.

This function was introduced in Altair to handle SyncAggregates that no sync committee member had signed off on, in which case the G2_POINT_AT_INFINITY can be considered a "correct" signature (in our case, but not according to the standard).

The networking and validator specs were later clarified to require that SyncAggregates have at least one signature. But this requirement is not enforced in the consensus layer (in process_sync_aggregate()), so we need to retain this eth_fast_aggregate_verify() wrapper to allow the empty signature to be valid.

Used by process_sync_aggregate()
Uses FastAggregateVerify()

  1. This document does not have the full force of an IETF standard. For one thing, it remains a draft (that is now expired), for another it is an IRTF document, meaning that it is from a research group rather than being on the IETF standards track. Some context from Brian Carpenter, former IETF chair,

    I gather that you are referring to an issue in draft-irtf-cfrg-bls-signature-04. That is not even an IETF draft; it's an IRTF draft, apparently being discussed in an IRTF Research Group. So it is not even remotely under consideration to become an IETF standard...

Created by Ben Edgington. Licensed under CC BY-SA 4.0. Published 2023-09-29 14:16 UTC. Commit ebfcf50.