Part 2: Technical Overview

The Building Blocks

SSZ: Simple Serialize

First cut \checkmark Revision TODO
  • The beacon chain uses a novel serialisation method called Simple Serialize (SSZ).
  • After much debate we chose to use SSZ for both consensus and communication.
  • SSZ is not self-describing; you need to know in advance what you are deserialising.
  • An offset scheme allows fast access to subsets of the data.
  • SSZ plays nicely with Merkleization and generalised indices in Merkle proofs.

Introduction

Serialisation is the process of taking structured information (in our case, a data structure) and transforming it into a representation that can be stored or transmitted.

A cooking recipe is a kind of serialisation. I can write down a method for cooking something in such a way that you and others can recreate the method to cook the same thing. The recipe can be written in a book, appear online, even be spoken and memorised – this is serialisation. Using the recipe to cook something is deserialisation.

Serialisation is used for three main purposes on the beacon chain.

  1. Consensus: if you and I each have information in a data structure, such as the beacon state, how can we know if our data structures are the same or not? Serialisation allows us to answer this question, as long as all clients use the same method. Note that this is also bound up with Merkleization.
  2. Peer-to-peer communication: we need to exchange data structures over the Internet, such as attestations and blocks. We can't transmit structured data as-is, it must be serialised for transmission and deserialised at the other end. All clients must use the same p2p serialisation, but it doesn't need to be the same as the consensus serialisation.
  3. Similarly, data structures need to be serialised for users accessing a beacon node's API. Clients are free to choose their own API serialisation. For example, the Prysm client has an API that uses Protocol Buffers (which is being deprecated now that we have agreed a common API format that uses both SSZ and JSON).

In addition, data must be serialised before being written to disk. Each client is free to do this internally however they wish.

Ethereum 2.0 uses a bespoke serialisation scheme called Simple Serialize, or more commonly just "SSZ"1, for all of these purposes.

History

It seems like we spent months over the end of 2018 and the start of 2019 talking about serialisation, and the story below is highly simplified. But I think it's worth recording some of the considerations and design decisions.

Ethereum 1 has always used a serialisation format called RLP (recursive length prefix). This was deemed unsuitable for Ethereum 2, largely because it is regarded as overly complex.2

So, we had the freedom to choose a new serialisation protocol. What kind of decision points did we consider?

Serialisation for consensus

Starting with serialisation in the consensus protocol, the first big question was whether to adopt an existing off-the-shelf protocol or to roll our own.

One major issue with many existing schemes is that they do not guarantee that the serialisation is deterministic: they sometimes re-order fields in unpredictable ways. This makes them totally unsuitable for consensus; the same data must result in the same output every time.

A more general concern was around using third-party libraries in a consensus-critical situation. Back in 2014, Vitalik wrote a justification, titled Why not use X?, of Ethereum implementing its own technology (such as RLP) for so many things. Here's an excerpt:

One of our core principles in Ethereum is simplicity; the protocol should be as simple as possible, and the protocol should not contain any black boxes. Every single feature of every single sub-protocol should be precisely 100% documented on the whitepaper or wiki, and implemented using that as a specification.

Certainly, with respect to serialisation, some third-party libraries are far more generic than we need, which can lead to issues. Others don't map nicely to the data types that we want to use.

In view of these concerns momentum was in favour of adopting a bespoke, tightly specified serialisation method. It was the development of Merkleization on top of SSZ that cemented this, making SSZ (in some form) the clear leader for consensus serialisation.

Serialisation for communications

That decision made, the next big question was whether to use the same scheme for both consensus serialisation and peer-to-peer communications serialisation (the "wire-protocol"). This was finely balanced, and good arguments were made in favour of using Protocol Buffers for p2p communication and SSZ for consensus.

Discussion around this was extensive (see the references below), but we eventually decided to use SSZ for p2p communications.

The factors that tipped the balance in favour of SSZ for communications were (1) a desire to maintain only one serialisation library, and (2) some possible performance benefit.

On the first of these, there is a bias in Ethereum 2 to favour "simplicity over efficiency". Maintaining two serialisation libraries is arguably more overhead than any potential gain from using different ones. Having said that, RLP is still used in Eth2's discovery layer (since it is shared Eth1), so this argument loses some of its force.

On the second, when we receive an object over the wire, often the first thing we will want to do is to serialise it to calculate its data root for consensus. If we receive it already serialised in the right format then it saves a deserialise/reserialise round trip.

SSZ does not make any effort to compact or compress the serialised data, and there were concerns that this might make it inefficient for the wire transfer protocol. These concerns were alleviated by adding Snappy compression on the wire, as is already done in Ethereum 1.

SSZ development

SSZ is based on Ethereum's smart contract ABI, but with 4-byte position and size records rather than 32-byte, and different basic data types. It will immediately feel familiar to anyone who has fiddled with that. The rudiments of SSZ were laid down by Vitalik in August 2017.

The initial, more developed, spec for SSZ was merged into the beacon chain repository in October 2018, with the Container type being added a month later.

A big step forward in the utility of SSZ, and what established it as the serialisation protocol of choice for consensus, was the development of Merkleization (also known as tree hashing), first discussed in October 2018 and adopted into the spec in November.

Also in November 2018 we agreed to switch the byte ordering for integer types from big-endian to little-endian at the request of the Nimbus team. This means that the 32-bit number representing 66 decimal would is now serialised as 0x42000000 rather than 0x00000042. The main motivation for the change was to map better to byte-ordering in typical microprocessors.

April 2019 saw a major change to SSZ with the adoption of offsets. This came from a scheme, Simple Offset Serialisation, previously proposed by Péter Szilágyi. The idea is to split the objects we are serialising according to whether they are fixed length or variable length. The serialisation then has two sections. The first section contains both actual serialisations of any fixed length objects, and pointers (offsets) to the serialisations of any variable length objects. The second section contains the serialisations of the variable length objects. The motivation for this is to allow fast access to arbitrary parts of the serialised data without having to deserialise the whole structure.

There was one final substantial re-work of the SSZ spec in June 2019 in which SSZ lists were required to have a maximum length specified, and bitlist and bitvector types were added.

Overview

The specification of SSZ is maintained in the main consensus specs repo, and that's the place to go for all the details. I will only be presenting an introductory overview here, with a few examples.

The ultimate goal of SSZ is to be able to represent complex internal data structures such as the BeaconState as strings of bytes.

The formal properties that we require for SSZ to be useful for both consensus and communications areas defined in the SSZ formal verification exercise. Given objects O1O_1 and O2O_2, both of type TT, we require that SSZ be

  1. involutive: deserialiseT(serialiseT(O1))=O1\texttt{deserialise}\langle T \rangle(\texttt{serialise}\langle T \rangle(O_1)) = O_1 (required for communications), and
  2. injective: serialiseT(O1)=serialiseT(O2)\texttt{serialise}\langle T \rangle(O_1) = \texttt{serialise}\langle T \rangle(O_2) implies that O1=O2O_1 = O_2 (required for consensus).

The first property says that when we serialise an object of a certain type then deserialise the result, we end up with an object identical to the one we started with. This is essential for the communications protocol.

The second says that if we serialise two objects of the same type and get the same result then the two objects are identical. Equivalently, if we have two different objects of the same type then their serialisations will differ. This is essential for the consensus protocol.

Beyond those basic functional requirements, other goals for SSZ are to be (relatively) simple, to create (fairly) compact serialisations, and to be compatible with Merkleization. It is also useful to be able to quickly access specific bits of data within the serialisation without deserialising the entire object. The adoption of offsets into SSZ improved its performance in that respect.

Unlike RLP, SSZ is not self-describing. You can decode RLP data into a structured object without knowing in advance what that object looks like. This is not the case for SSZ: you must know in advance exactly what you are deserialising. In practice this has not been a problem for Eth2: we always know in advance what class of object a particular deserialised blob of data corresponds to. A consequence of this is that, while in RLP two objects of different types cannot serialise to the same output, in SSZ they can. We'll see an example of this shortly.

Specification

I don't plan to go into every last detail of SSZ – that's what the specification is for – rather, we'll take a general overview and then dive into a worked example.

The building blocks of SSZ are its basic types and its composite types.

Basic types

SSZ's basic types are very simple and limited, comprising only the following two classes.

  • Unsigned integers: a uintN is an N-bit unsigned integer, where N can be 8, 16, 32, 64, 128 or 256.
  • Booleans: a boolean is either True or False.

The serialisation of basic types lives up to the "simple" name:

  • uintN types are encoded as the little-endian representation in N/8 bytes. For example, the decimal number 12345 (0x3039 in hexadecimal) as a uint16 type is serialised as 0x3930 (two bytes). The same number as a uint32 type is serialised as 0x39300000 (four bytes).
  • boolean types are always one byte and serialised as 0x01 for true and 0x00 for false.

I have embedded some examples in the following descriptions. You can run them yourself if you set up the Eth2 spec as per the instructions in the Appendices. The examples can be run via the Python REPL or by putting the commands in a file (I show both approaches).

>>> from eth2spec.utils.ssz.ssz_typing import uint64, boolean
>>> uint64(0x0123456789abcdef).encode_bytes().hex()
'efcdab8967452301'
>>> boolean(True).encode_bytes().hex()
'01'
>>> boolean(False).encode_bytes().hex()
'00'
Composite types

Composite types hold combinations of or multiples of smaller types. The spec defines the following composite types: vectors, lists, bitvectors, bitlists, unions, and containers. I will skip unions in the following as they are not currently used in Ethereum 2.

Vectors

A vector is an ordered fixed-length homogeneous collection with exactly N values. "Homogeneous" means that all the elements of a vector must be of the same type, but they do not need to be of the same size. For example, we could have a vector containing lists that each have different numbers of elements.

In the SSZ spec a vector is denoted by Vector[type, N]. For example Vector[uint8, 32] is a 32 element list of uint8 types (bytes). The type can be anything, including other vectors or even containers.

Vectors provide a simple example of needing to know what kind of object you are deserialising before you attempt it. In the following example, the same string of bytes encodes both a four element set of two-byte integers, and an eight element set of one-byte integers. When we deserialise this we need to know which of these (or many other possibilities) we are expecting to get.

>>> from eth2spec.utils.ssz.ssz_typing import uint8, uint16, Vector
>>> Vector[uint16, 4](1, 2, 3, 4).encode_bytes().hex()
'0100020003000400'
>>> Vector[uint8, 8](1, 0, 2, 0, 3, 0, 4, 0).encode_bytes().hex()
'0100020003000400'

Fun fact: in early versions of the SSZ spec, vectors were called tuples.

Lists

A list is an ordered variable-length homogeneous collection with a maximum of N values.

In the SSZ spec a list is denoted by List[type, N]. For example, List[uint64, 100] is a list containing anywhere between zero and one hundred uint64 types.

The maximum length parameter, N, on lists is not used in serialisation or deserialisation. It is used, however, in Merkleization, and in particular enables generalised indices in Merkle proof generation.

Both vectors and lists have the same serialisation when they are treated as stand-alone objects:

>>> from eth2spec.utils.ssz.ssz_typing import uint8, List, Vector
>>> List[uint8, 100](1, 2, 3).encode_bytes().hex()
'010203'
>>> Vector[uint8, 3](1, 2, 3).encode_bytes().hex()
'010203'

So why not use lists everywhere? Since lists are variable sized objects in SSZ they are encoded differently from fixed sized vectors when contained within another object, so there is a small overhead. The container Foo holding the variable sized list is encoded with an extra four byte offset at the start. We'll see why a bit later.

>>> from eth2spec.utils.ssz.ssz_typing import uint8, Vector, List, Container
>>> class Foo(Container):
...     x: List[uint8, 3]
>>> class Bar(Container):
...     x: Vector[uint8, 3]
>>> Foo(x = [1, 2, 3]).encode_bytes().hex()
'04000000010203'
>>> Bar(x = [1, 2, 3]).encode_bytes().hex()
'010203'
Bitvectors

A bitvector is an ordered fixed-length collection of boolean values with N bits. In the SSZ spec, a bitvector is denoted by Bitvector[N].

It is not obvious from the spec, but bitvectors use little-endian bit format:

>>> from eth2spec.utils.ssz.ssz_typing import Bitvector
>>> Bitvector[8](0,0,0,0,0,0,0,1).encode_bytes().hex()
'80'

Bitvectors are encoded into the minimum necessary number of whole bytes (N // 8) and padded with zeroes in the high bits if N is not a multiple of 8.

As noted in the spec, functionally we could use either Vector[boolean, N] or Bitvector[N] to represent a list of bits. However, the latter will have a serialisation up to eight times shorter in practice since the former will use a whole byte per bit.

>>> from eth2spec.utils.ssz.ssz_typing import Vector, Bitvector, boolean
>>> Bitvector[5](1,0,1,0,1).encode_bytes().hex()
'15'
>>> Vector[boolean,5](1,0,1,0,1).encode_bytes().hex()
'0100010001'

The same consideration applies for lists and bitlists.

Bitlists

A bitlist is an ordered variable-length collection of boolean values with a maximum of N bits. In the SSZ spec, a bitlist is denoted by Bitlist[N].

An interesting feature of bitlists3 is that they use a sentinel bit to indicate the length of the list. The number of whole bytes in the bitlist is easily derived from the offsets in the serialisation, but that doesn't give us the precise number of bits. For example, in a naive scheme 13 bits would be serialised into two bytes, so we would only know that the actual list length is somewhere between 9 and 16 bits.

To resolve this problem, bitlist serialisation adds an extra 1 bit at the end of the list (which becomes the highest-order bit in the little-endian encoding). The exact length of the bitlist can then be found by ignoring any consecutive high-order zero bits and then stripping off the single sentinel bit.

As an example, this bitlist with three elements is encoded into a single byte. To deserialise this, we take the total length in bits (eight), skip the four high-order zero bits, skip the sentinel bit, and then our list comprises the remaining three bits. Equivalently, the bitlist length is the index of the highest 1 bit in the serialisation.

>>> from eth2spec.utils.ssz.ssz_typing import Bitlist
>>> Bitlist[100](0,0,0).encode_bytes().hex()
'08'

A diagram showing how the bitlist sentinel works

The sentinel bit indicates the end of the bitlist. All bits beyond the sentinel are zero.

As a consequence of the sentinel, we require an extra byte to serialise a bitlist if its actual length is a multiple of eight (irrespective of the maximum length). This is not the case for a bitvector.

>>> Bitlist[8](0,0,0,0,0,0,0,0).encode_bytes().hex()
'0001'
>>> Bitvector[8](0,0,0,0,0,0,0,0).encode_bytes().hex()
'00'
Containers

A container is an ordered heterogeneous collection of values. Basically, a container can contain any arbitrary mix of types, including containers.

We define containers using Python's dataclass notation with key–type pairs. For example, this is a Deposit container. In the following examples I have indicated the underlying types in the appended comments.

class Deposit(Container):
    proof: Vector[Bytes32, DEPOSIT_CONTRACT_TREE_DEPTH + 1] # Vector[Vector[uint8, 32], N]
    data: DepositData

The Deposit container contains a DepositData container which is defined as follows.

class DepositData(Container):
    pubkey: BLSPubkey                # Bytes48 / Vector[uint8, 48]
    withdrawal_credentials: Bytes32  # Vector[uint8, 32]
    amount: Gwei                     # uint64
    signature: BLSSignature          # Bytes96 / Vector[uint8, 96]

We'll see how containers are serialised in the worked example, below.

Fixed and variable size types

SSZ distinguishes between fixed size and variable size types, and treats them differently when they are contained within other types.

  • Variable size types are lists, bitlists, and any type that contains a variable size type.
  • Everything else is fixed size.

This distinction is important when we serialise a compound type. The serialised output is created in two parts, as follows.

  1. The serialisation of fixed length types, along with 32-bit offsets to any variable length types.
  2. The serialisation of any variable length types.

This split between a fixed length part and a variable length part came about as a result of the offset encoding described earlier: it allows fast access to specific fields within a serialised data structure without needing to deserialise the whole thing.

As an example, consider the following container. It has a single fixed length uint8 type, followed by a variable length List[uint8,10] type, followed again by a fixed length uint8.

>>> from eth2spec.utils.ssz.ssz_typing import uint8, List, Container
>>> class Baz(Container):
...     x: uint8
...     y: List[uint8, 10]
...     z: uint8
>>> Baz(x = 1, y = [2, 3], z = 4).encode_bytes().hex()
'0106000000040203'

We see that the serialisation contains an unexpected 06 byte and some zero bytes. To see where they come from I'll break down the output as follows, where the first column is the byte number in the serialised string.

Start of Part 1 (fixed size elements)
00 01       - The serialisation of x = uint8(1)
01 06000000 - A 32-bit offset to byte 6 (in little-endian format),
              the start of the serialisation of y
05 04       - The serialisation of z = uint8(4)

Start of Part 2 (variable size elements)
06 0203     - The serialisation of y = List[uint8, N]([2, 3])

In Part 1, instead of directly encoding the variable size list in place, it is replaced with a pointer (an offset) to its serialisation in Part 2. So, for any container, the size of Part 1 is known and fixed no matter what kinds of variable size types are present. The actual lengths of the variable size objects can be deduced from the offsets in Part 1 and the overall length of the serialisation string.

Diagram of the serialisation of the Baz container

Serialisation of the Baz container. Fixed size parts are done first, with an offset specified for the variable size List data.

It's not only containers that use this format, it applies to any type that contains variable size types. Here's a vector whose elements are lists. As an exercise for the reader I'll leave you to decode what's going on here.

>>> from eth2spec.utils.ssz.ssz_typing import uint8, List, Vector
>>> Vector[List[uint8,3],4]([1,2],[3,4,5],[],[6]).encode_bytes().hex()
'10000000120000001500000015000000010203040506'
Aliases

Just quoting directly from the SSZ spec here for completeness:

For convenience we alias:

  • bit to boolean
  • byte to uint8 (this is a basic type)
  • BytesN and ByteVector[N] to Vector[byte, N] (this is not a basic type)
  • ByteList[N] to List[byte, N]

In the main beacon chain spec, a bunch of custom types are also defined in terms of the standard SSZ types and aliases. For example, Slot is an SSZ uint64 type, BLSPubkey is an SSZ Bytes48 type, and so on.

Default values

Finally, each type has a default value. Once again directly from the SSZ spec:

Type Default Value
uintN 0
boolean False
Container [default(type) for type in container]
Vector[type, N] [default(type)] * N
Bitvector[N] [False] * N
List[type, N] []
Bitlist[N] []

Worked example

Let's explore a worked example to gather all of this together. I'd rather use a real example than make up a synthetic object, so we are going to look at the aggregate IndexedAttestation that was included in the beacon chain block at slot 3080831, at position 87 within the block. (It would actually have been an Attestation object in the block, but those bitlists are fiddly, so we'll look at the equivalent IndexedAttestation.)

The data structures

The IndexedAttestation container looks like this.

class IndexedAttestation(Container):
    attesting_indices: List[ValidatorIndex, MAX_VALIDATORS_PER_COMMITTEE]
    data: AttestationData
    signature: BLSSignature

It contains an AttestationData container,

class AttestationData(Container):
    slot: Slot
    index: CommitteeIndex
    beacon_block_root: Root
    source: Checkpoint
    target: Checkpoint

which in turn contains two Checkpoint containers,

class Checkpoint(Container):
    epoch: Epoch
    root: Root
The serialisation

Now we have enough information to build the IndexedAttestation object and calculate its SSZ serialisation.

from eth2spec.utils.ssz.ssz_typing import *
from eth2spec.altair import mainnet
from eth2spec.altair.mainnet import *

attestation = IndexedAttestation(
    attesting_indices = [33652, 59750, 92360],
    data = AttestationData(
        slot = 3080829,
        index = 9,
        beacon_block_root = '0x4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6',
        source = Checkpoint (
            epoch = 96274,
            root = '0xd24639f2e661bc1adcbe7157280776cf76670fff0fee0691f146ab827f4f1ade'
        ),
        target = Checkpoint(
            epoch = 96275,
            root = '0x9bcd31881817ddeab686f878c8619d664e8bfa4f8948707cba5bc25c8d74915d'
        )
    ),
    signature = '0xaaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3acc888d8af'
                + '5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856'
                + 'c2ddfc1a78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437'
)

print(attestation.encode_bytes().hex())

The resulting serialised blob of data that represents this IndexedAttestation object is (in hexadecimal):

e40000007d022f000000000009000000000000004f4250c05956f5c2b87129cf7372f14dd576fc15
2543bf7042e963196b843fe61278010000000000d24639f2e661bc1adcbe7157280776cf76670fff
0fee0691f146ab827f4f1ade13780100000000009bcd31881817ddeab686f878c8619d664e8bfa4f
8948707cba5bc25c8d74915daaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3ac
c888d8af5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856c2ddfc1a
78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437748300000000000066e90000
00000000c868010000000000

This can be transmitted as a string of bytes over the wire and, knowing at the other end that it represents an IndexedAttestation, reconstituted into an identical copy.

The serialisation unpacked

To make sense of this, we'll break down the serialisation into its parts. The first column is the byte-offset from the start of the byte string (in hexadecimal). Before each line I've indicated which part of the data structure it corresponds to, and I've translated the type aliases into their basic underlying SSZ types. Remember that all integer types are little-endian, so 7d022f0000000000 is the hexadecimal number 0x2f027d, which is 3080829 in decimal (the slot number).

Start of Part 1 (fixed size elements)
   4-byte offset to the variable length attestation.attesting_indices starting at 0xe4
00 e4000000

   attestation.data.slot: Slot / uint64
04 7d022f0000000000

   attestation.data.index: CommitteeIndex / uint64
0c 0900000000000000

   attestation.data.beacon_block_root: Root / Bytes32 / Vector[uint8, 32]
14 4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6

   attestation.data.source.epoch: Epoch / uint64
34 1278010000000000

   attestation.data.source.root: Root / Bytes32 / Vector[uint8, 32]
3c d24639f2e661bc1adcbe7157280776cf76670fff0fee0691f146ab827f4f1ade

   attestation.data.target.epoch: Epoch / uint64
5c 1378010000000000

   attestation.data.target.root: Root / Bytes32 / Vector[uint8, 32]
64 9bcd31881817ddeab686f878c8619d664e8bfa4f8948707cba5bc25c8d74915d

   attestation.signature: BLSSignature / Bytes96 / Vector[uint8, 96]
84 aaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3acc888d8af
a4 5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856
c4 c2ddfc1a78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437

Start of Part 2 (variable size elements)
   attestation.attesting_indices: List[uint64, MAX_VALIDATORS_PER_COMMITTEE]
e4 748300000000000066e9000000000000c868010000000000

The first thing to notice is that the attesting_indices list is variable size, so it is represented in Part 1 by an offset pointing to where the actual data is. In this case, at 0xe4 bytes (228 bytes) from the start of the serialised data. The actual length of the list can be calculated as the length of the whole string (252 bytes) minus 228 bytes (the start of the list) divided by 8 bytes, one per element. Thus we recover our list of three validator indices.

All the remaining items are fixed size, and are encoded in-place, including recursively encoding the fixed size AttestationData object, and its fixed size Checkpoint children.

Diagram of the serialisation of the IndexedAttestation container

Serialisation of the IndexedAttestation container.

Multiple variable size objects

It is instructive to see how container with multiple variable size child objects is serialised. For this example we will make an AttesterSlashing object that contains two of the above IndexedAttestation objects. This is a contrived example; the slashing report is not valid since the contents are duplicates.

An AttesterSlashing container is defined as follows,

class AttesterSlashing(Container):
    attestation_1: IndexedAttestation
    attestation_2: IndexedAttestation

which we can populate and serialise like this, using our previously defined IndexedAttestation object, attestation.

slashing = AttesterSlashing(
    attestation_1 = attestation,
    attestation_2 = attestation
)

print(slashing.encode_bytes().hex())

From this we get the following serialisation, again shown with the byte-offset within the byte string in the first column.

Start of Part 1 (fixed size elements)
0000 08000000
0004 04010000

Start of Part 2 (variable size elements)
0008 e40000007d022...
0104 e40000007d022...

This time we have two variable length types, so they are both replaced by offsets pointing to the start of the actual variable length data which appears in Part 2. The length of attestation_1 is calculated as the difference between the two offsets, and the length of attestation_2 is calculated as the length from its offset to the end of the string.

Another thing to note is that, since attestation_1 and attestation_2 are identical, their serialisations within this compound object are identical, including their internal offsets to their own variable length parts. That is, both attestations have variable length data at offset 0xe4 within their own serialisations; the offset is relative to the start of each sub-object's serialisation, not the entire string. This property simplifies recursive serialisation and deserialisation: a given object will have the same serialisation no matter what context it is found in.

Diagram of the serialisation of the AttesterSlashing container

Serialisation of the AttesterSlashing container.

See also

The SSZ specification is the authoritative source. There is also a curated list of SSZ implementations.

The historical discussion threads around whether to use SSZ for both consensus and p2p serialisation or not are a goldmine of insight and wisdom.

Other SSZ resources:

  • SSZ encoding diagrams by Protolambda.
  • Formal verification of the SSZ specification: Notes and Code.
  • An excellent SSZ explainer by Raul Jordan with a deep dive into implementing it in Golang. (Note that the specific library referenced in the article has now been deprecated in favour of fastssz.)
  • An interactive SSZ serialiser/deserialiser by ChainSafe with all the containers for Phase 0 and Altair available to play with. On the "Deserialize" tab you can paste the data from the IndexedAttestation above and verify that it deserialises correctly (you'll need to remove line breaks).

  1. Thus enshrining that ugly "z" in the full name, and the ghastly "ess-ess-zee" pronunciation.
  2. Vitalik, "As the inventor of RLP, I'm inclined to prefer SSZ", and again, "RLP honestly sucks" (with some explanation as to why!).
  3. Though not entirely uncontroversial. Basically, if the application layer already knows what length of bitlist it expects – which it generally does in Eth2, since although committee sizes change, the sizes are known – then we could in principle dispense with the sentinel bit.

Created by Ben Edgington. Licensed under CC BY-SA 4.0. Published 2023-07-01 13:17 UTC. Commit 8fa708b.