Part 2: Technical Overview
The Building Blocks
SSZ: Simple Serialize
First cut | Revision | TODO |
- The beacon chain uses a novel serialisation method called Simple Serialize (SSZ).
- After much debate we chose to use SSZ for both consensus and communication.
- SSZ is not self-describing; you need to know in advance what you are deserialising.
- An offset scheme allows fast access to subsets of the data.
- SSZ plays nicely with Merkleization and generalised indices in Merkle proofs.
Introduction
Serialisation is the process of taking structured information (in our case, a data structure) and transforming it into a representation that can be stored or transmitted.
A cooking recipe is a kind of serialisation. I can write down a method for cooking something in such a way that you and others can recreate the method to cook the same thing. The recipe can be written in a book, appear online, even be spoken and memorised – this is serialisation. Using the recipe to cook something is deserialisation.
Serialisation is used for three main purposes on the beacon chain.
- Consensus: if you and I each have information in a data structure, such as the beacon state, how can we know if our data structures are the same or not? Serialisation allows us to answer this question, as long as all clients use the same method. Note that this is also bound up with Merkleization.
- Peer-to-peer communication: we need to exchange data structures over the Internet, such as attestations and blocks. We can't transmit structured data as-is, it must be serialised for transmission and deserialised at the other end. All clients must use the same p2p serialisation, but it doesn't need to be the same as the consensus serialisation.
- Similarly, data structures need to be serialised for users accessing a beacon node's API. Clients are free to choose their own API serialisation. For example, the Prysm client has an API that uses Protocol Buffers (which is being deprecated now that we have agreed a common API format that uses both SSZ and JSON).
In addition, data must be serialised before being written to disk. Each client is free to do this internally however they wish.
Ethereum 2.0 uses a bespoke serialisation scheme called Simple Serialize, or more commonly just "SSZ"1, for all of these purposes.
History
It seems like we spent months over the end of 2018 and the start of 2019 talking about serialisation, and the story below is highly simplified. But I think it's worth recording some of the considerations and design decisions.
Ethereum 1 has always used a serialisation format called RLP (recursive length prefix). This was deemed unsuitable for Ethereum 2, largely because it is regarded as overly complex.2
So, we had the freedom to choose a new serialisation protocol. What kind of decision points did we consider?
Serialisation for consensus
Starting with serialisation in the consensus protocol, the first big question was whether to adopt an existing off-the-shelf protocol or to roll our own.
One major issue with many existing schemes is that they do not guarantee that the serialisation is deterministic: they sometimes re-order fields in unpredictable ways. This makes them totally unsuitable for consensus; the same data must result in the same output every time.
A more general concern was around using third-party libraries in a consensus-critical situation. Back in 2014, Vitalik wrote a justification, titled Why not use X?, of Ethereum implementing its own technology (such as RLP) for so many things. Here's an excerpt:
One of our core principles in Ethereum is simplicity; the protocol should be as simple as possible, and the protocol should not contain any black boxes. Every single feature of every single sub-protocol should be precisely 100% documented on the whitepaper or wiki, and implemented using that as a specification.
Certainly, with respect to serialisation, some third-party libraries are far more generic than we need, which can lead to issues. Others don't map nicely to the data types that we want to use.
In view of these concerns momentum was in favour of adopting a bespoke, tightly specified serialisation method. It was the development of Merkleization on top of SSZ that cemented this, making SSZ (in some form) the clear leader for consensus serialisation.
Serialisation for communications
That decision made, the next big question was whether to use the same scheme for both consensus serialisation and peer-to-peer communications serialisation (the "wire-protocol"). This was finely balanced, and good arguments were made in favour of using Protocol Buffers for p2p communication and SSZ for consensus.
Discussion around this was extensive (see the references below), but we eventually decided to use SSZ for p2p communications.
The factors that tipped the balance in favour of SSZ for communications were (1) a desire to maintain only one serialisation library, and (2) some possible performance benefit.
On the first of these, there is a bias in Ethereum 2 to favour "simplicity over efficiency". Maintaining two serialisation libraries is arguably more overhead than any potential gain from using different ones. Having said that, RLP is still used in Eth2's discovery layer (since it is shared Eth1), so this argument loses some of its force.
On the second, when we receive an object over the wire, often the first thing we will want to do is to serialise it to calculate its data root for consensus. If we receive it already serialised in the right format then it saves a deserialise/reserialise round trip.
SSZ does not make any effort to compact or compress the serialised data, and there were concerns that this might make it inefficient for the wire transfer protocol. These concerns were alleviated by adding Snappy compression on the wire, as is already done in Ethereum 1.
SSZ development
SSZ is based on Ethereum's smart contract ABI, but with 4-byte position and size records rather than 32-byte, and different basic data types. It will immediately feel familiar to anyone who has fiddled with that. The rudiments of SSZ were laid down by Vitalik in August 2017.
The initial, more developed, spec for SSZ was merged into the beacon chain repository in October 2018, with the Container
type being added a month later.
A big step forward in the utility of SSZ, and what established it as the serialisation protocol of choice for consensus, was the development of Merkleization (also known as tree hashing), first discussed in October 2018 and adopted into the spec in November.
Also in November 2018 we agreed to switch the byte ordering for integer types from big-endian to little-endian at the request of the Nimbus team. This means that the 32-bit number representing 66 decimal would is now serialised as 0x42000000
rather than 0x00000042
. The main motivation for the change was to map better to byte-ordering in typical microprocessors.
April 2019 saw a major change to SSZ with the adoption of offsets. This came from a scheme, Simple Offset Serialisation, previously proposed by Péter Szilágyi. The idea is to split the objects we are serialising according to whether they are fixed length or variable length. The serialisation then has two sections. The first section contains both actual serialisations of any fixed length objects, and pointers (offsets) to the serialisations of any variable length objects. The second section contains the serialisations of the variable length objects. The motivation for this is to allow fast access to arbitrary parts of the serialised data without having to deserialise the whole structure.
There was one final substantial re-work of the SSZ spec in June 2019 in which SSZ lists were required to have a maximum length specified, and bitlist and bitvector types were added.
Overview
The specification of SSZ is maintained in the main consensus specs repo, and that's the place to go for all the details. I will only be presenting an introductory overview here, with a few examples.
The ultimate goal of SSZ is to be able to represent complex internal data structures such as the BeaconState as strings of bytes.
The formal properties that we require for SSZ to be useful for both consensus and communications areas defined in the SSZ formal verification exercise. Given objects and , both of type , we require that SSZ be
- involutive: (required for communications), and
- injective: implies that (required for consensus).
The first property says that when we serialise an object of a certain type then deserialise the result, we end up with an object identical to the one we started with. This is essential for the communications protocol.
The second says that if we serialise two objects of the same type and get the same result then the two objects are identical. Equivalently, if we have two different objects of the same type then their serialisations will differ. This is essential for the consensus protocol.
Beyond those basic functional requirements, other goals for SSZ are to be (relatively) simple, to create (fairly) compact serialisations, and to be compatible with Merkleization. It is also useful to be able to quickly access specific bits of data within the serialisation without deserialising the entire object. The adoption of offsets into SSZ improved its performance in that respect.
Unlike RLP, SSZ is not self-describing. You can decode RLP data into a structured object without knowing in advance what that object looks like. This is not the case for SSZ: you must know in advance exactly what you are deserialising. In practice this has not been a problem for Eth2: we always know in advance what class of object a particular deserialised blob of data corresponds to. A consequence of this is that, while in RLP two objects of different types cannot serialise to the same output, in SSZ they can. We'll see an example of this shortly.
Specification
I don't plan to go into every last detail of SSZ – that's what the specification is for – rather, we'll take a general overview and then dive into a worked example.
The building blocks of SSZ are its basic types and its composite types.
Basic types
SSZ's basic types are very simple and limited, comprising only the following two classes.
- Unsigned integers: a
uintN
is anN
-bit unsigned integer, whereN
can be 8, 16, 32, 64, 128 or 256. - Booleans: a
boolean
is eitherTrue
orFalse
.
The serialisation of basic types lives up to the "simple" name:
uintN
types are encoded as the little-endian representation inN/8
bytes. For example, the decimal number 12345 (0x3039
in hexadecimal) as auint16
type is serialised as0x3930
(two bytes). The same number as auint32
type is serialised as0x39300000
(four bytes).boolean
types are always one byte and serialised as0x01
for true and0x00
for false.
I have embedded some examples in the following descriptions. You can run them yourself if you set up the Eth2 spec as per the instructions in the Appendices. The examples can be run via the Python REPL or by putting the commands in a file (I show both approaches).
>>> from eth2spec.utils.ssz.ssz_typing import uint64, boolean
>>> uint64(0x0123456789abcdef).encode_bytes().hex()
'efcdab8967452301'
>>> boolean(True).encode_bytes().hex()
'01'
>>> boolean(False).encode_bytes().hex()
'00'
Composite types
Composite types hold combinations of or multiples of smaller types. The spec defines the following composite types: vectors, lists, bitvectors, bitlists, unions, and containers. I will skip unions in the following as they are not currently used in Ethereum 2.
Vectors
A vector is an ordered fixed-length homogeneous collection with exactly N
values. "Homogeneous" means that all the elements of a vector must be of the same type, but they do not need to be of the same size. For example, we could have a vector containing lists that each have different numbers of elements.
In the SSZ spec a vector is denoted by Vector[type, N]
. For example Vector[uint8, 32]
is a 32 element list of uint8
types (bytes). The type
can be anything, including other vectors or even containers.
Vectors provide a simple example of needing to know what kind of object you are deserialising before you attempt it. In the following example, the same string of bytes encodes both a four element set of two-byte integers, and an eight element set of one-byte integers. When we deserialise this we need to know which of these (or many other possibilities) we are expecting to get.
>>> from eth2spec.utils.ssz.ssz_typing import uint8, uint16, Vector
>>> Vector[uint16, 4](1, 2, 3, 4).encode_bytes().hex()
'0100020003000400'
>>> Vector[uint8, 8](1, 0, 2, 0, 3, 0, 4, 0).encode_bytes().hex()
'0100020003000400'
Fun fact: in early versions of the SSZ spec, vectors were called tuples.
Lists
A list is an ordered variable-length homogeneous collection with a maximum of N
values.
In the SSZ spec a list is denoted by List[type, N]
. For example, List[uint64, 100]
is a list containing anywhere between zero and one hundred uint64
types.
The maximum length parameter, N
, on lists is not used in serialisation or deserialisation. It is used, however, in Merkleization, and in particular enables generalised indices in Merkle proof generation.
Both vectors and lists have the same serialisation when they are treated as stand-alone objects:
>>> from eth2spec.utils.ssz.ssz_typing import uint8, List, Vector
>>> List[uint8, 100](1, 2, 3).encode_bytes().hex()
'010203'
>>> Vector[uint8, 3](1, 2, 3).encode_bytes().hex()
'010203'
So why not use lists everywhere? Since lists are variable sized objects in SSZ they are encoded differently from fixed sized vectors when contained within another object, so there is a small overhead. The container Foo
holding the variable sized list is encoded with an extra four byte offset at the start. We'll see why a bit later.
>>> from eth2spec.utils.ssz.ssz_typing import uint8, Vector, List, Container
>>> class Foo(Container):
... x: List[uint8, 3]
>>> class Bar(Container):
... x: Vector[uint8, 3]
>>> Foo(x = [1, 2, 3]).encode_bytes().hex()
'04000000010203'
>>> Bar(x = [1, 2, 3]).encode_bytes().hex()
'010203'
Bitvectors
A bitvector is an ordered fixed-length collection of boolean
values with N
bits. In the SSZ spec, a bitvector is denoted by Bitvector[N]
.
It is not obvious from the spec, but bitvectors use little-endian bit format:
>>> from eth2spec.utils.ssz.ssz_typing import Bitvector
>>> Bitvector[8](0,0,0,0,0,0,0,1).encode_bytes().hex()
'80'
Bitvectors are encoded into the minimum necessary number of whole bytes (N // 8
) and padded with zeroes in the high bits if N
is not a multiple of 8.
As noted in the spec, functionally we could use either Vector[boolean, N]
or Bitvector[N]
to represent a list of bits. However, the latter will have a serialisation up to eight times shorter in practice since the former will use a whole byte per bit.
>>> from eth2spec.utils.ssz.ssz_typing import Vector, Bitvector, boolean
>>> Bitvector[5](1,0,1,0,1).encode_bytes().hex()
'15'
>>> Vector[boolean,5](1,0,1,0,1).encode_bytes().hex()
'0100010001'
The same consideration applies for lists and bitlists.
Bitlists
A bitlist is an ordered variable-length collection of boolean
values with a maximum of N
bits. In the SSZ spec, a bitlist is denoted by Bitlist[N]
.
An interesting feature of bitlists3 is that they use a sentinel bit to indicate the length of the list. The number of whole bytes in the bitlist is easily derived from the offsets in the serialisation, but that doesn't give us the precise number of bits. For example, in a naive scheme 13 bits would be serialised into two bytes, so we would only know that the actual list length is somewhere between 9 and 16 bits.
To resolve this problem, bitlist serialisation adds an extra 1
bit at the end of the list (which becomes the highest-order bit in the little-endian encoding). The exact length of the bitlist can then be found by ignoring any consecutive high-order zero bits and then stripping off the single sentinel bit.
As an example, this bitlist with three elements is encoded into a single byte. To deserialise this, we take the total length in bits (eight), skip the four high-order zero bits, skip the sentinel bit, and then our list comprises the remaining three bits. Equivalently, the bitlist length is the index of the highest 1
bit in the serialisation.
>>> from eth2spec.utils.ssz.ssz_typing import Bitlist
>>> Bitlist[100](0,0,0).encode_bytes().hex()
'08'
As a consequence of the sentinel, we require an extra byte to serialise a bitlist if its actual length is a multiple of eight (irrespective of the maximum length). This is not the case for a bitvector.
>>> Bitlist[8](0,0,0,0,0,0,0,0).encode_bytes().hex()
'0001'
>>> Bitvector[8](0,0,0,0,0,0,0,0).encode_bytes().hex()
'00'
Containers
A container is an ordered heterogeneous collection of values. Basically, a container can contain any arbitrary mix of types, including containers.
We define containers using Python's dataclass
notation with key–type pairs. For example, this is a Deposit
container. In the following examples I have indicated the underlying types in the appended comments.
class Deposit(Container):
proof: Vector[Bytes32, DEPOSIT_CONTRACT_TREE_DEPTH + 1] # Vector[Vector[uint8, 32], N]
data: DepositData
The Deposit
container contains a DepositData
container which is defined as follows.
class DepositData(Container):
pubkey: BLSPubkey # Bytes48 / Vector[uint8, 48]
withdrawal_credentials: Bytes32 # Vector[uint8, 32]
amount: Gwei # uint64
signature: BLSSignature # Bytes96 / Vector[uint8, 96]
We'll see how containers are serialised in the worked example, below.
Fixed and variable size types
SSZ distinguishes between fixed size and variable size types, and treats them differently when they are contained within other types.
- Variable size types are lists, bitlists, and any type that contains a variable size type.
- Everything else is fixed size.
This distinction is important when we serialise a compound type. The serialised output is created in two parts, as follows.
- The serialisation of fixed length types, along with 32-bit offsets to any variable length types.
- The serialisation of any variable length types.
This split between a fixed length part and a variable length part came about as a result of the offset encoding described earlier: it allows fast access to specific fields within a serialised data structure without needing to deserialise the whole thing.
As an example, consider the following container. It has a single fixed length uint8
type, followed by a variable length List[uint8,10]
type, followed again by a fixed length uint8
.
>>> from eth2spec.utils.ssz.ssz_typing import uint8, List, Container
>>> class Baz(Container):
... x: uint8
... y: List[uint8, 10]
... z: uint8
>>> Baz(x = 1, y = [2, 3], z = 4).encode_bytes().hex()
'0106000000040203'
We see that the serialisation contains an unexpected 06
byte and some zero bytes. To see where they come from I'll break down the output as follows, where the first column is the byte number in the serialised string.
Start of Part 1 (fixed size elements)
00 01 - The serialisation of x = uint8(1)
01 06000000 - A 32-bit offset to byte 6 (in little-endian format),
the start of the serialisation of y
05 04 - The serialisation of z = uint8(4)
Start of Part 2 (variable size elements)
06 0203 - The serialisation of y = List[uint8, N]([2, 3])
In Part 1, instead of directly encoding the variable size list in place, it is replaced with a pointer (an offset) to its serialisation in Part 2. So, for any container, the size of Part 1 is known and fixed no matter what kinds of variable size types are present. The actual lengths of the variable size objects can be deduced from the offsets in Part 1 and the overall length of the serialisation string.
It's not only containers that use this format, it applies to any type that contains variable size types. Here's a vector whose elements are lists. As an exercise for the reader I'll leave you to decode what's going on here.
>>> from eth2spec.utils.ssz.ssz_typing import uint8, List, Vector
>>> Vector[List[uint8,3],4]([1,2],[3,4,5],[],[6]).encode_bytes().hex()
'10000000120000001500000015000000010203040506'
Aliases
Just quoting directly from the SSZ spec here for completeness:
For convenience we alias:
bit
toboolean
byte
touint8
(this is a basic type)BytesN
andByteVector[N]
toVector[byte, N]
(this is not a basic type)ByteList[N]
toList[byte, N]
In the main beacon chain spec, a bunch of custom types are also defined in terms of the standard SSZ types and aliases. For example, Slot
is an SSZ uint64
type, BLSPubkey
is an SSZ Bytes48
type, and so on.
Default values
Finally, each type has a default value. Once again directly from the SSZ spec:
Type | Default Value |
---|---|
uintN |
0 |
boolean |
False |
Container |
[default(type) for type in container] |
Vector[type, N] |
[default(type)] * N |
Bitvector[N] |
[False] * N |
List[type, N] |
[] |
Bitlist[N] |
[] |
Worked example
Let's explore a worked example to gather all of this together. I'd rather use a real example than make up a synthetic object, so we are going to look at the aggregate IndexedAttestation
that was included in the beacon chain block at slot 3080831, at position 87 within the block. (It would actually have been an Attestation
object in the block, but those bitlists are fiddly, so we'll look at the equivalent IndexedAttestation
.)
The data structures
The IndexedAttestation
container looks like this.
class IndexedAttestation(Container):
attesting_indices: List[ValidatorIndex, MAX_VALIDATORS_PER_COMMITTEE]
data: AttestationData
signature: BLSSignature
It contains an AttestationData
container,
class AttestationData(Container):
slot: Slot
index: CommitteeIndex
beacon_block_root: Root
source: Checkpoint
target: Checkpoint
which in turn contains two Checkpoint
containers,
class Checkpoint(Container):
epoch: Epoch
root: Root
The serialisation
Now we have enough information to build the IndexedAttestation
object and calculate its SSZ serialisation.
from eth2spec.utils.ssz.ssz_typing import *
from eth2spec.altair import mainnet
from eth2spec.altair.mainnet import *
attestation = IndexedAttestation(
attesting_indices = [33652, 59750, 92360],
data = AttestationData(
slot = 3080829,
index = 9,
beacon_block_root = '0x4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6',
source = Checkpoint (
epoch = 96274,
root = '0xd24639f2e661bc1adcbe7157280776cf76670fff0fee0691f146ab827f4f1ade'
),
target = Checkpoint(
epoch = 96275,
root = '0x9bcd31881817ddeab686f878c8619d664e8bfa4f8948707cba5bc25c8d74915d'
)
),
signature = '0xaaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3acc888d8af'
+ '5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856'
+ 'c2ddfc1a78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437'
)
print(attestation.encode_bytes().hex())
The resulting serialised blob of data that represents this IndexedAttestation
object is (in hexadecimal):
e40000007d022f000000000009000000000000004f4250c05956f5c2b87129cf7372f14dd576fc15
2543bf7042e963196b843fe61278010000000000d24639f2e661bc1adcbe7157280776cf76670fff
0fee0691f146ab827f4f1ade13780100000000009bcd31881817ddeab686f878c8619d664e8bfa4f
8948707cba5bc25c8d74915daaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3ac
c888d8af5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856c2ddfc1a
78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437748300000000000066e90000
00000000c868010000000000
This can be transmitted as a string of bytes over the wire and, knowing at the other end that it represents an IndexedAttestation
, reconstituted into an identical copy.
The serialisation unpacked
To make sense of this, we'll break down the serialisation into its parts. The first column is the byte-offset from the start of the byte string (in hexadecimal). Before each line I've indicated which part of the data structure it corresponds to, and I've translated the type aliases into their basic underlying SSZ types. Remember that all integer types are little-endian, so 7d022f0000000000
is the hexadecimal number 0x2f027d
, which is 3080829 in decimal (the slot number).
Start of Part 1 (fixed size elements)
4-byte offset to the variable length attestation.attesting_indices starting at 0xe4
00 e4000000
attestation.data.slot: Slot / uint64
04 7d022f0000000000
attestation.data.index: CommitteeIndex / uint64
0c 0900000000000000
attestation.data.beacon_block_root: Root / Bytes32 / Vector[uint8, 32]
14 4f4250c05956f5c2b87129cf7372f14dd576fc152543bf7042e963196b843fe6
attestation.data.source.epoch: Epoch / uint64
34 1278010000000000
attestation.data.source.root: Root / Bytes32 / Vector[uint8, 32]
3c d24639f2e661bc1adcbe7157280776cf76670fff0fee0691f146ab827f4f1ade
attestation.data.target.epoch: Epoch / uint64
5c 1378010000000000
attestation.data.target.root: Root / Bytes32 / Vector[uint8, 32]
64 9bcd31881817ddeab686f878c8619d664e8bfa4f8948707cba5bc25c8d74915d
attestation.signature: BLSSignature / Bytes96 / Vector[uint8, 96]
84 aaf504503ff15ae86723c906b4b6bac91ad728e4431aea3be2e8e3acc888d8af
a4 5dffbbcf53b234ea8e3fde67fbb09120027335ec63cf23f0213cc439e8d1b856
c4 c2ddfc1a78ed3326fb9b4fe333af4ad3702159dbf9caeb1a4633b752991ac437
Start of Part 2 (variable size elements)
attestation.attesting_indices: List[uint64, MAX_VALIDATORS_PER_COMMITTEE]
e4 748300000000000066e9000000000000c868010000000000
The first thing to notice is that the attesting_indices
list is variable size, so it is represented in Part 1 by an offset pointing to where the actual data is. In this case, at 0xe4
bytes (228 bytes) from the start of the serialised data. The actual length of the list can be calculated as the length of the whole string (252 bytes) minus 228 bytes (the start of the list) divided by 8 bytes, one per element. Thus we recover our list of three validator indices.
All the remaining items are fixed size, and are encoded in-place, including recursively encoding the fixed size AttestationData
object, and its fixed size Checkpoint
children.
Multiple variable size objects
It is instructive to see how container with multiple variable size child objects is serialised. For this example we will make an AttesterSlashing
object that contains two of the above IndexedAttestation
objects. This is a contrived example; the slashing report is not valid since the contents are duplicates.
An AttesterSlashing
container is defined as follows,
class AttesterSlashing(Container):
attestation_1: IndexedAttestation
attestation_2: IndexedAttestation
which we can populate and serialise like this, using our previously defined IndexedAttestation
object, attestation
.
slashing = AttesterSlashing(
attestation_1 = attestation,
attestation_2 = attestation
)
print(slashing.encode_bytes().hex())
From this we get the following serialisation, again shown with the byte-offset within the byte string in the first column.
Start of Part 1 (fixed size elements)
0000 08000000
0004 04010000
Start of Part 2 (variable size elements)
0008 e40000007d022...
0104 e40000007d022...
This time we have two variable length types, so they are both replaced by offsets pointing to the start of the actual variable length data which appears in Part 2. The length of attestation_1
is calculated as the difference between the two offsets, and the length of attestation_2
is calculated as the length from its offset to the end of the string.
Another thing to note is that, since attestation_1
and attestation_2
are identical, their serialisations within this compound object are identical, including their internal offsets to their own variable length parts. That is, both attestations have variable length data at offset 0xe4
within their own serialisations; the offset is relative to the start of each sub-object's serialisation, not the entire string. This property simplifies recursive serialisation and deserialisation: a given object will have the same serialisation no matter what context it is found in.
See also
The SSZ specification is the authoritative source. There is also a curated list of SSZ implementations.
The historical discussion threads around whether to use SSZ for both consensus and p2p serialisation or not are a goldmine of insight and wisdom.
- Possibly the first substantial discussion around which serialisation scheme to adopt. It covers various alternatives, touches on the p2p vs. consensus issues, and rehearses some of the desirable properties.
- An early discussion of SSZ went over some of the issues and led into the discussion below.
- Proposal to use SSZ for consensus only.
- Piper Merriam's Everything You Never Wanted To Know About Serialization remains a good summary of many of the considerations.
Other SSZ resources:
- SSZ encoding diagrams by Protolambda.
- Formal verification of the SSZ specification: Notes and Code.
- An excellent SSZ explainer by Raul Jordan with a deep dive into implementing it in Golang. (Note that the specific library referenced in the article has now been deprecated in favour of fastssz.)
- An interactive SSZ serialiser/deserialiser by ChainSafe with all the containers for Phase 0 and Altair available to play with. On the "Deserialize" tab you can paste the data from the
IndexedAttestation
above and verify that it deserialises correctly (you'll need to remove line breaks).
- Thus enshrining that ugly "z" in the full name, and the ghastly "ess-ess-zee" pronunciation.↩
- Vitalik, "As the inventor of RLP, I'm inclined to prefer SSZ", and again, "RLP honestly sucks" (with some explanation as to why!).↩
- Though not entirely uncontroversial. Basically, if the application layer already knows what length of bitlist it expects – which it generally does in Eth2, since although committee sizes change, the sizes are known – then we could in principle dispense with the sentinel bit.↩