Hash Tree Roots and Merkleization
First cut  $\checkmark$  Revision  TODO 
 A hash tree root provides a succinct cryptographic digest of an SSZ data structure.
 Calculating the hash tree root involves recursively Merkleizing the data structure.
 Merkleization is tightly coupled to SSZ and is defined in the same spec.
 The use of hash tree roots enables large parts of the beacon state to be cached, making it practical to operate with a monolithic beacon state.
 Eth2's Merkleization approach facilitates generalised indices and Merkle proofs which are important for light clients.
Introduction
While discussing SSZ, I asserted that serialisation is important for consensus without going into the details. In this section we will unpack that and take a deep dive into how Ethereum 2 nodes know that they share a view of the world.
Let's say that you and I want to compare our beacon states to see if we have an identical view of the state of the chain. One way we could do this is by serialising our respective beacon states and sending them to each other. We could then compare them bytebybyte to check that they match. The problem with this is that the serialised beacon state at the time of writing is over 41MB in size and takes several seconds to transmit over the Internet. This is completely impractical for a global consensus protocol.
What we need is a digest of the state: a brief summary that is enough to determine with a very high degree of confidence whether you and I have the same state, or whether they differ. The digest must also have the property that noone can fake it. That is, you can't convince me that you have the same state as I do while actually having a different state.
Thankfully, such digests exist in the form of cryptographic hash functions. These take a (potentially) large amount of input data and mangle it up into a small number of bytes, typically 32, that for all practical purposes uniquely fingerprint the data.
Armed with such a hash function^{1} we can improve on the previous idea. You and I separately serialise our beacon states and then hash (apply the hash function to) the resulting strings. This is much faster than sending all the data over the network. Now we only need to exchange and compare our very short 32 byte hashes. If they match then we have the same state; if they don't match then our states differ.
This process is very common, and was an early candidate for consensus purposes in Ethereum 2, though it was apparent fairly early that there might be better ways.
A problem with this approach is that, if you modify any part of the state – even a single bit – then you need to recalculate the hash of the entire serialised state. This is potentially a huge overhead. It was dealt with in early versions of the spec by splitting the beacon state into two parts: a slowly changing "crystallised" state that would rarely need rehashing, and a smaller fast changing "active" state. However, this division of the state was a bit arbitrary and began to compromise some other parts of the design.
Ultimately, the split state approach was abandoned in favour of a method called "tree hashing", which is built on a technique called Merkleization^{2}. The remainder of this section explores this approach.
Tree hashing brings two significant advantages over other methods of creating a beacon state digest.
The first advantage is performance. On the face of it, tree hashing is quite inefficient since it requires hashing around double the amount of data to calculate the digest (the root) of a structure compared with the other method of simply hashing the entire serialisation. However, the way that hash trees are constructed in Ethereum 2 allows us to cache the roots of entire subtrees that have not changed. So, for example, by design the list of validator records in the state does not change frequently. As a result we can cache the hash tree root of the list and do not need to recalculate it every time we recalculate the entire beacon state root. Overall this feature results in a huge reduction in the total amount of hashing required to calculate state roots, and is an important part of making the beacon chain protocol usable in practice.
The second advantage is lightclient support. Indeed, the original motivation for implementing tree hashing was only about supporting light clients. Tree hashing enables efficient Merkle proofs that allow subsets of the beacon state to be provided to light clients. As long as a light client has the hash tree root by some means it can use the proofs to verify that the provided data is correct.
We will first recap Merkle trees, then extend them to Merkleization, and finally look at the construction of the hash tree root, which is our ultimate goal.
Terminology
The SSZ specification uses the term "Merkleization" to refer to both
 the operation of finding the root of a Merkle tree given its leaves, and
 the operation of finding the hash tree root of an SSZ object.
For didactic purposes I've chosen to distinguish between these more precisely. In the following sections I'll be calling the first "Merkleization", and the second "calculating a hash tree root".
With these definitions, calculating the hash tree root of an SSZ object uses Merkleization, potentially multiple steps of it, but also involves other steps such packing, chunking, and length mixins. Moreover, Merkleization always works with full binary trees (the number of leaves is a power of two), whereas hash tree roots can be derived from much more complex binary tree structures.
Merkle Trees
To understand Merkleization we first need to understand Merkle trees. These are not at all new, and date back to the 1970s.
The idea is that we have a set of "leaves", which is our data, and we iteratively reduce those leaves down to a single, short root via hashing. This reduction is done by hashing the leaves in pairs to make a "parent" node. We repeat the process on the parent nodes to make grandparent nodes, and so on to build a binary tree structure that culminates in a single ancestral root. In Merkleization we will be dealing only with structures that have a power of two number of leaves, so we have a full binary tree.
In the following diagram, the leaves are our four blobs of data, $A$, $B$, $C$, and $D$. These can be any string of data, though in Merkleization they will be 32 byte "chunks". The function $H$ is our hash function, and the operator $+$ concatenates strings. So $H(A+B)$ is the hash of the concatenation of strings $A$ and $B$^{3}.
Example of a Merkle tree.
In the Eth2 implementation, each box in the diagram is a 32byte string of data: either a 32byte leaf, or the 32byte output of the hash function. Thus we obtain the 32byte root of the tree, which is a "digest" of the data represented by the leaves. The root uniquely represents the data in the leaves; any change in the leaves leads to a different root.
Here's the same thing again on the Python REPL, assigning leaf values as $A=1$, $B=2$, $C=3$ and $D=4$. We construct the root of the tree starting from the leaves and descending through its levels until reaching the root, $H(H(A + B) + H(C + D))$. Note that all the leaf values are padded to 32bytes and are littleendian (as per their SSZ serialisation).
>>> from eth2spec.utils.ssz.ssz_typing import uint256
>>> from eth2spec.utils.hash_function import hash
>>> a = uint256(1).to_bytes(length = 32, byteorder='little')
>>> b = uint256(2).to_bytes(length = 32, byteorder='little')
>>> c = uint256(3).to_bytes(length = 32, byteorder='little')
>>> d = uint256(4).to_bytes(length = 32, byteorder='little')
>>> ab = hash(a + b)
>>> cd = hash(c + d)
>>> abcd = hash(ab + cd)
>>> abcd.hex()
'bfe3c665d2e561f13b30606c580cb703b2041287e212ade110f0bfd8563e21bb'
Merkle tree constructions are a fairly common way to calculate a digest of a bunch of data, such as a blockchain state. Ethereum 1 uses a more sophisticated version of this called a hexary Merkle–Patricia trie (in Eth1 it's a "trie" not a "tree" for complicated reasons), though there are proposals to simplify that.
An extremely useful feature of Merkle trees is that it is quite efficient to construct inclusion proofs using them. This is critical functionality for light clients, and we will discuss it in depth when we look at Merkle proofs.
Merkleization
The normal way to implement a Merkle tree is to store the entire tree structure in memory or on disk, including all the intermediate levels between the leaves and the root. As leaves are updated the affected nodes in the tree are updated: changing $A$ means updating $H(A+B)$ and then the root, everything else is unchanged.
The difference with Merkleization is that the Merkle tree is computed onthefly from the given leaves. We can pick up where we left off from the last REPL session as follows.
>>> from eth2spec.utils.merkle_minimal import merkleize_chunks
>>> merkleize_chunks([a, b, c, d]).hex()
'bfe3c665d2e561f13b30606c580cb703b2041287e212ade110f0bfd8563e21bb'
The Merkleization function (called merkleize()
in the SSZ spec, and merkleize_chunks()
in the executable spec) takes a list of 32byte chunks and returns the root of the tree for which those chunks are the leaves.
The list of chunks passed to merkleize_chunks()
can be any length, but will be padded with zero chunks so that the total number of chunks is rounded up to the next whole power of two, such that we conceptually have a full binary tree. Thus a list of three chunks gets implicitly padded with an extra zero chunk:
>>> z = bytearray(32)
>>> merkleize_chunks([a, b, c]).hex()
'66c419026fee8793be7fd0011b9db46b98a79f9c9b640e25317865c358f442db'
>>> merkleize_chunks([a, b, c, z]).hex()
'66c419026fee8793be7fd0011b9db46b98a79f9c9b640e25317865c358f442db'
A larger tree width can be provided as a parameter to merkleize_chunks()
, and the list will be padded with zero chunks accordingly. This capability is used when dealing with lists and bitlists.
>>> merkleize_chunks([a]).hex()
'0100000000000000000000000000000000000000000000000000000000000000'
>>> merkleize_chunks([a], 4).hex()
'553c8ccfd20bb4db224b1ae47359e9968a5c8098c15d8bf728b19e55749c773b'
>>> merkleize_chunks([a, z, z, z]).hex()
'553c8ccfd20bb4db224b1ae47359e9968a5c8098c15d8bf728b19e55749c773b'
An implementation can do this zero padding "virtually", and can optimise further by precomputing the various levels of hashes of zero chunks: $H(0 + 0)$, $H(H(0 + 0) + H(0 + 0))$, and so on. In this way we don't always need to build the whole tree to find the Merkle root.
Note that the Merkleization of a single chunk is always just the chunk itself. This reduces the overall amount of hashing needed.
The Hash Tree Root
The hash tree root is a generalisation of Merkleization that we can apply to the kind of complex, compound data structures we have in the beacon state. Calculating hash tree roots is tightly connected to the typescheme of Simple Serialize.
Calculating the hash tree root of an SSZ object is recursive. Given a composite SSZ object, we iteratively move through the layers of its structure until we reach a basic type or collection of basic types that we can pack into chunks and Merkleize directly. Then we move back through the structure using the calculated hash tree roots as chunks themselves.
The process of calculating a hash tree root is defined in the Simple Serialize specification, and that's the place to go for the full details. However, in simplified form (once again ignoring the SSZ union type) there are basically two paths to choose from when finding an object's hash tree root.
 For basic types or collections of basic types (lists and vectors), we just pack and Merkleize directly.
 For containers and collections of composite types, we recursively find the hash tree roots of the contents.
The following two rules are a simplified summary of the first six rules listed in the specification.
 If
X
is an SSZ basic object, a list or vector of basic objects, or a bitlist or bitvector, thenhash_tree_root(X) = merkleize_chunks(pack(X))
. Thepack()
function returns a list of chunks that can be Merkleized directly.  If
X
is an SSZ container, or a vector or list of composite objects, then the hash tree root is calculated recursively,hash_tree_root(X) = merkleize_chunks([hash_tree_root(x) for x in X])
. The list comprehension is a list of hash tree roots, which is equivalent to a list of chunks.
We'll see plenty of concrete applications of these two rules in the worked example below.
Packing and Chunking
Merkleization operates on lists of "chunks" which are 32byte blobs of data. Lists generated by means of step 2 above are already in this form. However, step 1 involves basic objects that require a "packing and chunking" operation prior to Merkleization.
The spec gives the precise rules, but it basically looks like this:
 The object (a basic type, a list/vector of basic types, or a bitlist/bitvector) is serialised via SSZ. The sentinel bit is omitted from the serialisation of bitlist types.
 The serialisation is rightpadded with zero bytes up to the next full chunk (32 byte boundary).
 The result is split into a list of 32 byte chunks.
 If necessary, further (virtual) zero chunks will be appended to reach the following total lengths (only lists and bitlists might actually need extra padding):
 All basic types give a single chunk; no basic type has a serialisation longer than 32 bytes.
Bitlist[N]
andBitvector[N]
:(N + 255) // 256
(dividing by chunk size in bits and rounding up)List[B, N]
andVector[B, N]
, whereB
is a basic type:(N * size_of(B) + 31) // 32
(dividing by chunk size in bytes and rounding up)
Containers and composite objects that result from rule 2 will have the following numbers of chunks, including zerochunk padding where required for lists.
List[C, N]
andVector[C, N]
, whereC
is a composite type:N
, since the Merkleization comprisesN
hash tree roots. Containers:
len(fields)
, since there is one hash tree root per field in the container.
It is not immediately obvious why lists and bitlists are padded with zero chunks up to their full maximum lengths, even if these are "virtual" chunks. However, this enables the use of generalised indices which provide a consistent way of creating Merkle proofs against hash tree roots, the topic of our next section.
Recall that, in addition to any padding added here, the Merkleization process will further pad the list with zero chunks to make it up to a power of two in length.
Mixing in the length
We want objects that have the same type but different contents to have different hash tree roots. This presents a problem for lists. Consider the list a
of three elements, and the list b
which is the same three elements plus a fourth zero element on the end. These are different lists of the same type, but both Merkleize to the same value.
>>> from eth2spec.utils.ssz.ssz_typing import uint256, List
>>> from eth2spec.utils.merkle_minimal import merkleize_chunks
>>> a = List[uint256, 4](1, 2, 3).encode_bytes()
>>> b = List[uint256, 4](1, 2, 3, 0).encode_bytes()
>>> merkleize_chunks([a[0:32], a[32:64], a[64:96]])
0x66c419026fee8793be7fd0011b9db46b98a79f9c9b640e25317865c358f442db
>>> merkleize_chunks([b[0:32], b[32:64], b[64:96], b[96:128]])
0x66c419026fee8793be7fd0011b9db46b98a79f9c9b640e25317865c358f442db
We need to ensure that a list ending with a zero value has a different hash tree root from the same list without the zero value. To do this, we put lists (and bitlists) through an extra mix_in_length()
process that involves hashing a concatenation of the Merkle root of the list and the length of the list. This is equivalent to the Merkleization of two chunks, the first being the Merkle root of the list, the second being its length.
See the diagram for attesting_indices
below for an illustration of this in practice.
Bitlists require a similar treatment since we remove the sentinel bit before Merkleizing.
Summaries and Expansions
The SSZ spec describes features of Merkleization called summaries and expansions. These are not explicit functions of Merkleization, but implicitly arise as consequences of the design.
Simply put, anywhere in the process, an entire SSZ object can be replaced with its hash tree root without affecting the final result.
We make use of this in a number of ways. First and foremost is the ability to cache the hash tree roots of any unchanged parts of the state, which makes it practical to recalculate the hash tree root of the whole state when required. For example, if a validator record is unchanged we do not need to recalculate its hash tree root when finding the root of the validator registry. If the validator registry is unchanged, we do not need to recalculate its hash tree root when calculating the full state root.
As another example, consider the BeaconBlock
and the BeaconBlockHeader
types.
class BeaconBlock(Container):
slot: Slot
proposer_index: ValidatorIndex
parent_root: Root
state_root: Root
body: BeaconBlockBody
class BeaconBlockHeader(Container):
slot: Slot
proposer_index: ValidatorIndex
parent_root: Root
state_root: Root
body_root: Root
These differ only in their last fields, body
and body_root
respectively. If body_root
is the hash tree root of the BeaconBlockBody
, body
, then these two objects will have exactly the same hash tree root. BeaconBlock
is the expansion type of BeaconBlockHeader
; BeaconBlockHeader
is a summary type of BeaconBlock
. Proposer slashing reports make use of this fact to save space by stripping out the block bodies and replacing them with their hash tree roots.
The Flashbots MEVBoost design also makes use of this capability. In the MEVBoost system validators are required to sign "blinded blocks". That is, blocks for which they do not have the bodies. Since the header is a summary of the block (in the sense we are using the word summary here) the same signature will be valid both for the BeaconBlockHeader
and the corresponding full BeaconBlock
. This simplifies the protocol design.
Worked example
For this section's worked example we shall revisit our friend, the IndexedAttestation
. This gives us nice instances of Merkleizing composite type, list types, and vector types, as well as demonstrating summaries and expansions.
Recall that the IndexedAttestation
type is defined as follows,
class IndexedAttestation(Container):
attesting_indices: List[ValidatorIndex, MAX_VALIDATORS_PER_COMMITTEE]
data: AttestationData
signature: BLSSignature
We will create an instance of this just as we did previously, only for brevity I shall call it a
, rather than attestation
. We want to calculate the hash tree root of this IndexedAttestation
, a
.
A container's hash tree root is the Merkleization of the list of hash tree roots of the objects it contains (by rule 2). Diagrammatically we are building the following tree and finding its root.
Calculating the hash tree root of an IndexedAttestation
. In this and the following diagrams, $R(X)$ is the Merkleization of $X$, $S(X)$ is the SSZ serialisation of $X$. Each box is a 32 byte chunk, and the small digits are the number of leaves in the Merkleization operation.
Alternatively, in code, we have the following.
assert(a.hash_tree_root() == merkleize_chunks(
[
a.attesting_indices.hash_tree_root(),
a.data.hash_tree_root(),
a.signature.hash_tree_root()
]))
The merkleize_chunks()
function is provided by the merkle_minimal.py
library. We can apply this function directly as the hash tree roots in the list already constitute chunks. (We could also use the get_merkle_root()
function, but then we'd have to specify a pad_to
value of 4 to get a tree of the correct depth.)
The attesting_indices
root
Working down the members of the list, we need the hash tree root of the attesting_indices
object, which has type List[ValidatorIndex, MAX_VALIDATORS_PER_COMMITTEE]
. This is a list of basic types, namely uint64
since that's the type of ValidatorIndex
, and rule 1 applies.
Our attesting_indices
list has three elements, [33652, 59750, 92360]
, which we need to chunk and pad. First we serialise the list as usual with SSZ, then we pad it up to 32 bytes:
>>> serialize(a.attesting_indices).hex()
'748300000000000066e9000000000000c868010000000000'
>>> (serialize(a.attesting_indices) + bytearray(8)).hex()
'748300000000000066e9000000000000c8680100000000000000000000000000'
This gives us our first chunk. However, the full number of chunks we need is 2048 // 4 = 512
(MAX_VALIDATORS_PER_COMMITTEE
divided by uint64
s per chunk), so we must add 511 zero chunks. In practice this padding is done "virtually". The merkleize_chunks()
function allows us to specify the full number of chunks and takes care of adding the extras. Behind the scenes, it is creating a tenlayer deep Merkle tree with our 512 chunks as leaves and returning the tree's root.
>>> merkleize_chunks([serialize(a.attesting_indices) + bytearray(8)], 512).hex()
'04e3bf0951474a6b06dd506648fdf8e84866542614e1c14fa832cd4bebfda0e3'
If this were a vector then our work would be done. However, when working with lists, there is a little further wrinkle: as a final step we need to concatenate the root that we have with the actual length of the list and hash them together. This is the mix_in_length()
function described above which we implement here by Merkleizing the list's Merkle root with the list's length.
assert(a.attesting_indices.hash_tree_root() ==
merkleize_chunks(
[
merkleize_chunks([a.attesting_indices.encode_bytes() + bytearray(8)], 512),
a.attesting_indices.length().to_bytes(32, 'little')
]))
In diagram form the hash tree root calculation for the list looks like this.
Calculating the hash tree root of the attesting_indices
. This is a List[uint256, 2048]
type, and our example list has three elements, comprising a single chunk. Note the extra mix_in_length()
step that's applied to lists.
The data
root
The data
field of the IndexedAttestation
is another container, an AttestationData
object, defined as,
class AttestationData(Container):
slot: Slot
index: CommitteeIndex
beacon_block_root: Root
source: Checkpoint
target: Checkpoint
As before, to find the hash tree root of a container, by rule 2 we need the root of the roots it contains. That is,
assert(a.data.hash_tree_root() == merkleize_chunks(
[
a.data.slot.hash_tree_root(),
a.data.index.hash_tree_root(),
a.data.beacon_block_root.hash_tree_root(),
a.data.source.hash_tree_root(),
a.data.target.hash_tree_root()
]))
The Slot
and the CommitteeIndex
are just basic uint64
types. Their hash tree roots are their littleendian 256bit representations.
>>> a.data.slot.hash_tree_root().hex()
'7d022f0000000000000000000000000000000000000000000000000000000000'
>>> a.data.index.hash_tree_root().hex()
'0900000000000000000000000000000000000000000000000000000000000000'
The Root
is Bytes32
type, which is equivalent to a Vector[unit8, 32]
. Handily, the hash tree root is just the Root
value itself since its only a single chunk.
>>> a.data.beacon_block_root.hex()
'4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6'
>>> a.data.beacon_block_root.hash_tree_root().hex()
'4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6'
The source
and target
are once again containers, both having type Checkpoint
. The Checkpoint
type is simple to Merkleize with the knowledge we have. So, putting everything together, we can find the hash tree root of the data
field by hand as follows.
assert(a.data.hash_tree_root() == merkleize_chunks(
[
a.data.slot.to_bytes(32, 'little'),
a.data.index.to_bytes(32, 'little'),
a.data.beacon_block_root,
merkleize_chunks([a.data.source.epoch.to_bytes(32, 'little'), a.data.source.root]),
merkleize_chunks([a.data.target.epoch.to_bytes(32, 'little'), a.data.target.root])
]))
Calculating the hash tree root of an AttestationData
container. It contains in turn two Checkpoint
containers, source
and target
.
The signature
root
The final part of the IndexedAttestation
we need to deal with is the signature
field. This is of type Signature
, which is a Vector[uint8, 96]
and rule 1 applies. This is simple to Merkleize as it is exactly three chunks when packed. The merkleize_chunks()
function takes care of adding a single virtual zero chunk to make a poweroftwo number of leaves.
assert(a.signature.hash_tree_root() ==
merkleize_chunks([a.signature[0:32], a.signature[32:64], a.signature[64:96]]))
Calculating the hash tree root of a Signature
, which is really a Bytes96
, or Vector[uint8, 96]
type.
Putting it all together
Assembling all these parts we can illustrate in both diagram form and code form how the hash tree root of the IndexedAttestation
is calculated from the serialisation of the underlying basic types via repeated applications of Merkleization.
The full picture
Illustrating the steps required to calculate the hash tree root of an IndexedAttestation
. The small digits are the number of leaves in each Merkleization operation.
The full code
The following code illustrates all the points from the worked example. You can run it by setting up the executable spec as described in the Appendix. If everything goes well the only thing it should print is Success!
.
from eth2spec.altair import mainnet
from eth2spec.altair.mainnet import *
from eth2spec.utils.ssz.ssz_typing import *
from eth2spec.utils.merkle_minimal import merkleize_chunks
# Initialise an IndexedAttestation type
a = IndexedAttestation(
attesting_indices = [33652, 59750, 92360],
data = AttestationData(
slot = 3080829,
index = 9,
beacon_block_root = '0x4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6',
source = Checkpoint (
epoch = 96274,
root = '0xd24639f2e661bc1adcbe7157280776cf76670fff0fee0691f146ab827f4f1ade'
),
target = Checkpoint(
epoch = 96275,
root = '0x9bcd31881817ddeab686f878c8619d664e8bfa4f8948707cba5bc25c8d74915d'
)
),
signature = '0xaaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3acc888d8af'
+ '5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856'
+ 'c2ddfc1a78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437'
)
# A container's root is the merkleization of the roots of its fields.
# This is IndexedAttestation.
assert(a.hash_tree_root() == merkleize_chunks(
[
a.attesting_indices.hash_tree_root(),
a.data.hash_tree_root(),
a.signature.hash_tree_root()
]))
# A list is serialised then (virtually) padded to its full number of chunks before Merkleization.
# Finally its actual length is mixed in via a further hash/merkleization.
assert(a.attesting_indices.hash_tree_root() ==
merkleize_chunks(
[
merkleize_chunks([a.attesting_indices.encode_bytes() + bytearray(8)], 512),
a.attesting_indices.length().to_bytes(32, 'little')
]))
# A container's root is the merkleization of the roots of its fields.
# This is AttestationData.
assert(a.data.hash_tree_root() == merkleize_chunks(
[
a.data.slot.hash_tree_root(),
a.data.index.hash_tree_root(),
a.data.beacon_block_root.hash_tree_root(),
a.data.source.hash_tree_root(),
a.data.target.hash_tree_root()
]))
# Expanding the above AttestationData roots by "manually" calculating the roots of its fields.
assert(a.data.hash_tree_root() == merkleize_chunks(
[
a.data.slot.to_bytes(32, 'little'),
a.data.index.to_bytes(32, 'little'),
a.data.beacon_block_root,
merkleize_chunks([a.data.source.epoch.to_bytes(32, 'little'), a.data.source.root]),
merkleize_chunks([a.data.target.epoch.to_bytes(32, 'little'), a.data.target.root]),
]))
# The Signature type has a simple Merkleization.
assert(a.signature.hash_tree_root() ==
merkleize_chunks([a.signature[0:32], a.signature[32:64], a.signature[64:96]]))
# Putting everything together, we have a "byhand" Merkleization of the IndexedAttestation.
assert(a.hash_tree_root() == merkleize_chunks(
[
# a.attesting_indices.hash_tree_root()
merkleize_chunks(
[
merkleize_chunks([a.attesting_indices.encode_bytes() + bytearray(8)], 512),
a.attesting_indices.length().to_bytes(32, 'little')
]),
# a.data.hash_tree_root()
merkleize_chunks(
[
a.data.slot.to_bytes(32, 'little'),
a.data.index.to_bytes(32, 'little'),
a.data.beacon_block_root,
merkleize_chunks([a.data.source.epoch.to_bytes(32, 'little'), a.data.source.root]),
merkleize_chunks([a.data.target.epoch.to_bytes(32, 'little'), a.data.target.root]),
]),
# a.signature.hash_tree_root()
merkleize_chunks([a.signature[0:32], a.signature[32:64], a.signature[64:96]])
]))
print("Success!")
See also
The SSZ specification is the authoritative source for Merkleization as well as serialisation. Many SSZ implementations also include Merkleization.
A formal verification of Merkleization has been performed: Notes and Code.
The Remerkleable library is a Python implementation that introduces some more advanced tools such as backing trees for the data structures. Ztyp is a further exploration of backing trees. Backing trees are a useful approach to representing and maintaining the beacon state within client implementations.
Given the limited type of hashing that's done during Merkleization (always hashing the concatenation of two 32 byte strings), it's worth looking into whether specific performance optimisations are available. Potuz has done this, and produced an optimised library, Hashtree.

See the Annotated Spec for the saga of Eth2's hash function, and how we ended up with SHA256.↩

The name Merkleization derives from Merkle trees, which in turn are named for the computer scientist Ralph Merkle.
I believe the noun "Merkleization", though, is ours. I've adopted the majority preferred spelling, which is also the version that made it into the SSZ spec. The ugly version won despite my best efforts.↩

For some reason, in computer science, trees are traditionally depicted the other way up. Call me eccentric, but I like my trees to have their leaves at the top and their roots at the bottom.↩