Update ledger replication chapter (#2029)
* ledger block -> ledger segment The book already defines a *block* to be a slight variation of how block-based changes define it. It's the thing the cluster confirms should be the next set of transactions on the ledger. * Boot storage description from the book
This commit is contained in:
@ -1,4 +1,4 @@
|
|||||||
# Fullnode
|
# Anatomy of a Fullnode
|
||||||
|
|
||||||
<img alt="Fullnode block diagrams" src="img/fullnode.svg" class="center"/>
|
<img alt="Fullnode block diagrams" src="img/fullnode.svg" class="center"/>
|
||||||
|
|
||||||
|
@ -1,114 +1,2 @@
|
|||||||
# Ledger Replication
|
# Ledger Replication
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
At full capacity on a 1gbps network Solana would generate 4 petabytes of data
|
|
||||||
per year. If each fullnode was required to store the full ledger, the cost of
|
|
||||||
storage would discourage fullnode participation, thus centralizing the network
|
|
||||||
around those that could afford it. Solana aims to keep the cost of a fullnode
|
|
||||||
below $5,000 USD to maximize participation. To achieve that, the network needs
|
|
||||||
to minimize redundant storage while at the same time ensuring the validity and
|
|
||||||
availability of each copy.
|
|
||||||
|
|
||||||
To trust storage of ledger segments, Solana has *replicators* periodically
|
|
||||||
submit proofs to the network that the data was replicated. Each proof is called
|
|
||||||
a Proof of Replication. The basic idea of it is to encrypt a dataset with a
|
|
||||||
public symmetric key and then hash the encrypted dataset. Solana uses [CBC
|
|
||||||
encryption](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_Block_Chaining_(CBC)).
|
|
||||||
To prevent a malicious replicator from deleting the data as soon as it's
|
|
||||||
hashed, a replicator is required hash random segments of the dataset.
|
|
||||||
Alternatively, Solana could require hashing the reverse of the encrypted data,
|
|
||||||
but random sampling is sufficient and much faster. Either solution ensures
|
|
||||||
that all the data is present during the generation of the proof and also
|
|
||||||
requires the validator to have the entirety of the encrypted data present for
|
|
||||||
verification of every proof of every identity. The space required to validate
|
|
||||||
is:
|
|
||||||
|
|
||||||
``` number_of_proofs * data_size ```
|
|
||||||
|
|
||||||
## Optimization with PoH
|
|
||||||
|
|
||||||
Solana is not the only distribute systems project using Proof of Replication,
|
|
||||||
but it might be the most efficient implementation because of its ability to
|
|
||||||
synchronize nodes with its Proof of History. With PoH, Solana is able to record
|
|
||||||
a hash of the PoRep samples in the ledger. Thus the blocks stay in the exact
|
|
||||||
same order for every PoRep and verification can stream the data and verify all
|
|
||||||
the proofs in a single batch. This way Solana can verify multiple proofs
|
|
||||||
concurrently, each one on its own GPU core. With the current generation of
|
|
||||||
graphics cards our network can support up to 14,000 replication identities or
|
|
||||||
symmetric keys. The total space required for verification is:
|
|
||||||
|
|
||||||
``` 2 CBC_blocks * number_of_identities ```
|
|
||||||
|
|
||||||
with core count of equal to (Number of Identities). A CBC block is expected to
|
|
||||||
be 1MB in size.
|
|
||||||
|
|
||||||
## Network
|
|
||||||
|
|
||||||
Validators for PoRep are the same validators that are verifying transactions.
|
|
||||||
They have some stake that they have put up as collateral that ensures that
|
|
||||||
their work is honest. If you can prove that a validator verified a fake PoRep,
|
|
||||||
then the validator's stake is slashed.
|
|
||||||
|
|
||||||
Replicators are specialized light clients. They download a part of the ledger
|
|
||||||
and store it and provide proofs of storing the ledger. For each verified proof,
|
|
||||||
replicators are rewarded tokens from the mining pool.
|
|
||||||
|
|
||||||
## Constraints
|
|
||||||
|
|
||||||
Solana's PoRep protocol instroduces the following constraints:
|
|
||||||
|
|
||||||
* At most 14,000 replication identities can be used, because that is how many GPU
|
|
||||||
cores are currently available to a computer costing under $5,000 USD.
|
|
||||||
* Verification requires generating the CBC blocks. That requires space of 2
|
|
||||||
blocks per identity, and 1 GPU core per identity for the same dataset. As
|
|
||||||
many identities at once are batched with as many proofs for those identities
|
|
||||||
verified concurrently for the same dataset.
|
|
||||||
|
|
||||||
## Validation and Replication Protocol
|
|
||||||
|
|
||||||
1. The network sets a replication target number, let's say 1k. 1k PoRep
|
|
||||||
identities are created from signatures of a PoH hash. They are tied to a
|
|
||||||
specific PoH hash. It doesn't matter who creates them, or it could simply be
|
|
||||||
the last 1k validation signatures we saw for the ledger at that count. This is
|
|
||||||
maybe just the initial batch of identities, because we want to stagger identity
|
|
||||||
rotation.
|
|
||||||
2. Any client can use any of these identities to create PoRep proofs.
|
|
||||||
Replicator identities are the CBC encryption keys.
|
|
||||||
3. Periodically at a specific PoH count, a replicator that wants to create
|
|
||||||
PoRep proofs signs the PoH hash at that count. That signature is the seed
|
|
||||||
used to pick the block and identity to replicate. A block is 1TB of ledger.
|
|
||||||
4. Periodically at a specific PoH count, a replicator submits PoRep proofs for
|
|
||||||
their selected block. A signature of the PoH hash at that count is the seed
|
|
||||||
used to sample the 1TB encrypted block, and hash it. This is done faster than
|
|
||||||
it takes to encrypt the 1TB block with the original identity.
|
|
||||||
5. Replicators must submit some number of fake proofs, which they can prove to
|
|
||||||
be fake by providing the seed for the hash result.
|
|
||||||
6. Periodically at a specific PoH count, validators sign the hash and use the
|
|
||||||
signature to select the 1TB block that they need to validate. They batch all
|
|
||||||
the identities and proofs and submit approval for all the verified ones.
|
|
||||||
7. After #6, replicator client submit the proofs of fake proofs.
|
|
||||||
|
|
||||||
For any random seed, Solana requires everyone to use a signature that is
|
|
||||||
derived from a PoH hash. Every node uses the same count so that the same PoH
|
|
||||||
hash is signed by every participant. The signatures are then each
|
|
||||||
cryptographically tied to the keypair, which prevents a leader from grinding on
|
|
||||||
the resulting value for more than 1 identity.
|
|
||||||
|
|
||||||
Key rotation is *staggered*. Once going, the next identity is generated by
|
|
||||||
hashing itself with a PoH hash.
|
|
||||||
|
|
||||||
Since there are many more client identities then encryption identities, the
|
|
||||||
reward is split amont multiple clients to prevent Sybil attacks from generating
|
|
||||||
many clients to acquire the same block of data. To remain BFT, the network
|
|
||||||
needs to avoid a single human entity from storing all the replications of a
|
|
||||||
single chunk of the ledger.
|
|
||||||
|
|
||||||
Solana's solution to this is to require clients to continue using the same
|
|
||||||
identity. If the first round is used to acquire the same block for many client
|
|
||||||
identities, the second round for the same client identities will require a
|
|
||||||
redistribution of the signatures, and therefore PoRep identities and blocks.
|
|
||||||
Thus to get a reward for storage, clients are not rewarded for storage of the
|
|
||||||
first block. The network rewards long-lived client identities more than new
|
|
||||||
ones.
|
|
||||||
|
|
||||||
|
@ -145,8 +145,9 @@ The public key of a [keypair](#keypair).
|
|||||||
|
|
||||||
#### replicator
|
#### replicator
|
||||||
|
|
||||||
A type of [client](#client) that stores copies of segments of the
|
A type of [client](#client) that stores [ledger](#ledger) segments and
|
||||||
[ledger](#ledger).
|
periodically submits storage proofs to the cluster; not a
|
||||||
|
[fullnode](#fullnode).
|
||||||
|
|
||||||
#### secret key
|
#### secret key
|
||||||
|
|
||||||
@ -154,8 +155,8 @@ The private key of a [keypair](#keypair).
|
|||||||
|
|
||||||
#### slot
|
#### slot
|
||||||
|
|
||||||
The time (i.e. number of [blocks](#block)) for which a [leader](#leader) ingests
|
The time (i.e. number of [blocks](#block)) for which a [leader](#leader)
|
||||||
transactions and produces [entries](#entry).
|
ingests transactions and produces [entries](#entry).
|
||||||
|
|
||||||
#### sol
|
#### sol
|
||||||
|
|
||||||
@ -215,13 +216,29 @@ for potential future use.
|
|||||||
A fraction of a [block](#block); the smallest unit sent between
|
A fraction of a [block](#block); the smallest unit sent between
|
||||||
[fullnodes](#fullnode).
|
[fullnodes](#fullnode).
|
||||||
|
|
||||||
|
#### CBC block
|
||||||
|
|
||||||
|
Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of
|
||||||
|
many CBC blocks; `ledger_segment_size / cbc_block_size` to be exact.
|
||||||
|
|
||||||
#### curio
|
#### curio
|
||||||
|
|
||||||
A scarce, non-fungible member of a set of curios.
|
A scarce, non-fungible member of a set of curios.
|
||||||
|
|
||||||
#### epoch
|
#### epoch
|
||||||
|
|
||||||
The time, i.e. number of [slots](#slot), for which a [leader schedule](#leader-schedule) is valid.
|
The time, i.e. number of [slots](#slot), for which a [leader
|
||||||
|
schedule](#leader-schedule) is valid.
|
||||||
|
|
||||||
|
#### fake storage proof
|
||||||
|
|
||||||
|
A proof which has the same format as a storage proof, but the sha state is
|
||||||
|
actually from hashing a known ledger value which the storage client can reveal
|
||||||
|
and is also easily verifiable by the network on-chain.
|
||||||
|
|
||||||
|
#### ledger segment
|
||||||
|
|
||||||
|
A sequence of [blocks](#block).
|
||||||
|
|
||||||
#### light client
|
#### light client
|
||||||
|
|
||||||
@ -237,6 +254,37 @@ Millions of [instructions](#instruction) per second.
|
|||||||
The component of a [fullnode](#fullnode) responsible for [program](#program)
|
The component of a [fullnode](#fullnode) responsible for [program](#program)
|
||||||
execution.
|
execution.
|
||||||
|
|
||||||
|
#### storage proof
|
||||||
|
|
||||||
|
A set of SHA hash states which is constructed by sampling the encrypted version
|
||||||
|
of the stored [ledger segment](#ledger-segment) at certain offsets.
|
||||||
|
|
||||||
|
#### storage proof challenge
|
||||||
|
|
||||||
|
A [transaction](#transaction) from a [replicator](#replicator) that verifiably
|
||||||
|
proves that a [validator](#validator) [confirmed](#storage-proof-confirmation)
|
||||||
|
a [fake proof](#fake-storage-proof).
|
||||||
|
|
||||||
|
#### storage proof claim
|
||||||
|
|
||||||
|
A [transaction](#transaction) from a [validator](#validator) which is after the
|
||||||
|
timeout period given from the [storage proof
|
||||||
|
confirmation](#storage-proof-confirmation) and which no successful
|
||||||
|
[challenges](#storage-proof-challenge) have been observed which rewards the
|
||||||
|
parties of the [storage proofs](#storage-proof) and confirmations.
|
||||||
|
|
||||||
|
#### storage proof confirmation
|
||||||
|
|
||||||
|
A [transaction](#transaction) from a [validator](#validator) which indicates
|
||||||
|
the set of [real](#storage-proof) and [fake proofs](#fake-storage-proof)
|
||||||
|
submitted by a [replicator](#replicator). The transaction would contain a list
|
||||||
|
of proof hash values and a bit which says if this hash is valid or fake.
|
||||||
|
|
||||||
|
#### storage validation capacity
|
||||||
|
|
||||||
|
The number of keys and samples that a [validator](#validator) can verify each
|
||||||
|
storage epoch.
|
||||||
|
|
||||||
#### thin client
|
#### thin client
|
||||||
|
|
||||||
A type of [client](#client) that trusts it is communicating with a valid
|
A type of [client](#client) that trusts it is communicating with a valid
|
||||||
|
@ -1,11 +1,19 @@
|
|||||||
# Storage
|
# Ledger Replication
|
||||||
|
|
||||||
The goal of this RFC is to define a protocol for storing a very large ledger
|
At full capacity on a 1gbps network solana will generate 4 petabytes of data
|
||||||
over a p2p network that is verified by solana validators. At full capacity on
|
per year. To prevent the network from centralizing around full nodes that have
|
||||||
a 1gbps network solana will generate 4 petabytes of data per year. To prevent
|
to store the full data set this protocol proposes a way for mining nodes to
|
||||||
the network from centralizing around full nodes that have to store the full
|
provide storage capacity for pieces of the network.
|
||||||
data set this protocol proposes a way for mining nodes to provide storage
|
|
||||||
capacity for pieces of the network.
|
The basic idea to Proof of Replication is encrypting a dataset with a public
|
||||||
|
symmetric key using CBC encryption, then hash the encrypted dataset. The main
|
||||||
|
problem with the naive approach is that a dishonest storage node can stream the
|
||||||
|
encryption and delete the data as its hashed. The simple solution is to force
|
||||||
|
the hash to be done on the reverse of the encryption, or perhaps with a random
|
||||||
|
order. This ensures that all the data is present during the generation of the
|
||||||
|
proof and it also requires the validator to have the entirety of the encrypted
|
||||||
|
data present for verification of every proof of every identity. So the space
|
||||||
|
required to validate is `number_of_proofs * data_size`
|
||||||
|
|
||||||
## Definitions
|
## Definitions
|
||||||
|
|
||||||
@ -14,20 +22,20 @@ capacity for pieces of the network.
|
|||||||
Storage mining client, stores some part of the ledger enumerated in blocks and
|
Storage mining client, stores some part of the ledger enumerated in blocks and
|
||||||
submits storage proofs to the chain. Not a full-node.
|
submits storage proofs to the chain. Not a full-node.
|
||||||
|
|
||||||
#### ledger block
|
#### ledger segment
|
||||||
|
|
||||||
Portion of the ledger which is downloaded by the replicator where storage proof
|
Portion of the ledger which is downloaded by the replicator where storage proof
|
||||||
data is derived.
|
data is derived.
|
||||||
|
|
||||||
#### CBC block
|
#### CBC block
|
||||||
|
|
||||||
Smallest encrypted chunk of ledger, an encrypted ledger block would be made of
|
Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of
|
||||||
many CBC blocks. `(size of ledger block) / (size of cbc block)` to be exact.
|
many CBC blocks. `ledger_segment_size / cbc_block_size` to be exact.
|
||||||
|
|
||||||
#### storage proof
|
#### storage proof
|
||||||
|
|
||||||
A set of sha hash state which is constructed by sampling the encrypted version
|
A set of sha hash state which is constructed by sampling the encrypted version
|
||||||
of the stored ledger block at certain offsets.
|
of the stored ledger segment at certain offsets.
|
||||||
|
|
||||||
#### fake storage proof
|
#### fake storage proof
|
||||||
|
|
||||||
@ -56,28 +64,16 @@ observed which rewards the parties of the storage proofs and confirmations.
|
|||||||
|
|
||||||
The number of keys and samples that a validator can verify each storage epoch.
|
The number of keys and samples that a validator can verify each storage epoch.
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
The basic idea to Proof of Replication is encrypting a dataset with a public
|
|
||||||
symmetric key using CBC encryption, then hash the encrypted dataset. The main
|
|
||||||
problem with the naive approach is that a dishonest storage node can stream the
|
|
||||||
encryption and delete the data as its hashed. The simple solution is to force
|
|
||||||
the hash to be done on the reverse of the encryption, or perhaps with a random
|
|
||||||
order. This ensures that all the data is present during the generation of the
|
|
||||||
proof and it also requires the validator to have the entirety of the encrypted
|
|
||||||
data present for verification of every proof of every identity. So the space
|
|
||||||
required to validate is `(Number of Proofs)*(data size)`
|
|
||||||
|
|
||||||
## Optimization with PoH
|
## Optimization with PoH
|
||||||
|
|
||||||
Our improvement on this approach is to randomly sample the encrypted blocks
|
Our improvement on this approach is to randomly sample the encrypted segments
|
||||||
faster than it takes to encrypt, and record the hash of those samples into the
|
faster than it takes to encrypt, and record the hash of those samples into the
|
||||||
PoH ledger. Thus the blocks stay in the exact same order for every PoRep and
|
PoH ledger. Thus the segments stay in the exact same order for every PoRep and
|
||||||
verification can stream the data and verify all the proofs in a single batch.
|
verification can stream the data and verify all the proofs in a single batch.
|
||||||
This way we can verify multiple proofs concurrently, each one on its own CUDA
|
This way we can verify multiple proofs concurrently, each one on its own CUDA
|
||||||
core. The total space required for verification is `(1 ledger block) + (2 CBC
|
core. The total space required for verification is `1_ledger_segment +
|
||||||
blocks) * (Number of Identities)`, with core count of equal to (Number of
|
2_cbc_blocks * number_of_identities` with core count of equal to
|
||||||
Identities). We use a 64-byte chacha CBC block size.
|
`number_of_identities`. We use a 64-byte chacha CBC block size.
|
||||||
|
|
||||||
## Network
|
## Network
|
||||||
|
|
||||||
@ -106,8 +102,8 @@ changes to determine what rate it can validate storage proofs.
|
|||||||
|
|
||||||
### Constants
|
### Constants
|
||||||
|
|
||||||
1. NUM\_STORAGE\_ENTRIES: Number of entries in a block of ledger data. The unit
|
1. NUM\_STORAGE\_ENTRIES: Number of entries in a segment of ledger data. The
|
||||||
of storage for a replicator.
|
unit of storage for a replicator.
|
||||||
2. NUM\_KEY\_ROTATION\_TICKS: Number of ticks to save a PoH value and cause a
|
2. NUM\_KEY\_ROTATION\_TICKS: Number of ticks to save a PoH value and cause a
|
||||||
key generation for the section of ledger just generated and the rotation of
|
key generation for the section of ledger just generated and the rotation of
|
||||||
another key in the set.
|
another key in the set.
|
||||||
@ -167,19 +163,19 @@ is:
|
|||||||
2. A replicator obtains the PoH hash corresponding to the last key rotation
|
2. A replicator obtains the PoH hash corresponding to the last key rotation
|
||||||
along with its entry\_height.
|
along with its entry\_height.
|
||||||
3. The replicator signs the PoH hash with its keypair. That signature is the
|
3. The replicator signs the PoH hash with its keypair. That signature is the
|
||||||
seed used to pick the block to replicate and also the encryption key. The
|
seed used to pick the segment to replicate and also the encryption key. The
|
||||||
replicator mods the signature with the entry\_height to get which block to
|
replicator mods the signature with the entry\_height to get which segment to
|
||||||
replicate.
|
replicate.
|
||||||
4. The replicator retrives the ledger by asking peer validators and
|
4. The replicator retrives the ledger by asking peer validators and
|
||||||
replicators. See 6.5.
|
replicators. See 6.5.
|
||||||
5. The replicator then encrypts that block with the key with chacha algorithm
|
5. The replicator then encrypts that segment with the key with chacha algorithm
|
||||||
in CBC mode with NUM\_CHACHA\_ROUNDS of encryption.
|
in CBC mode with NUM\_CHACHA\_ROUNDS of encryption.
|
||||||
6. The replicator initializes a chacha rng with the signature from step 2 as
|
6. The replicator initializes a chacha rng with the signature from step 2 as
|
||||||
the seed.
|
the seed.
|
||||||
7. The replicator generates NUM\_STORAGE\_SAMPLES samples in the range of the
|
7. The replicator generates NUM\_STORAGE\_SAMPLES samples in the range of the
|
||||||
entry size and samples the encrypted block with sha256 for 32-bytes at each
|
entry size and samples the encrypted segment with sha256 for 32-bytes at each
|
||||||
offset value. Sampling the state should be faster than generating the encrypted
|
offset value. Sampling the state should be faster than generating the encrypted
|
||||||
block.
|
segment.
|
||||||
8. The replicator sends a PoRep proof transaction which contains its sha state
|
8. The replicator sends a PoRep proof transaction which contains its sha state
|
||||||
at the end of the sampling operation, its seed and the samples it used to the
|
at the end of the sampling operation, its seed and the samples it used to the
|
||||||
current leader and it is put onto the ledger.
|
current leader and it is put onto the ledger.
|
||||||
@ -198,9 +194,9 @@ frozen.
|
|||||||
### Finding who has a given block of ledger
|
### Finding who has a given block of ledger
|
||||||
|
|
||||||
1. Validators monitor the transaction stream for storage mining proofs, and
|
1. Validators monitor the transaction stream for storage mining proofs, and
|
||||||
keep a mapping of ledger blocks by entry\_height to public keys. When it sees a
|
keep a mapping of ledger segments by entry\_height to public keys. When it sees
|
||||||
storage mining proof it updates this mapping and provides an RPC interface
|
a storage mining proof it updates this mapping and provides an RPC interface
|
||||||
which takes an entry\_height and hands back a list of public keys. The client
|
which takes an entry\_height and hands back a list of public keys. The client
|
||||||
then looks up in their cluster\_info table to see which network address that
|
then looks up in their cluster\_info table to see which network address that
|
||||||
corresponds to and sends a repair request to retrieve the necessary blocks of
|
corresponds to and sends a repair request to retrieve the necessary blocks of
|
||||||
ledger.
|
ledger.
|
||||||
|
Reference in New Issue
Block a user