diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 3f09927462..683df4ab61 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -29,6 +29,7 @@ - [Fork Selection](fork-selection.md) - [Blocktree](blocktree.md) - [Data Plane Fanout](data-plane-fanout.md) + - [Reliable Vote Transmission](reliable-vote-transmission.md) - [Economic Design](ed_overview.md) - [Validation-client Economics](ed_validation_client_economics.md) diff --git a/book/src/reliable-vote-transmission.md b/book/src/reliable-vote-transmission.md new file mode 100644 index 0000000000..f5454ca8ee --- /dev/null +++ b/book/src/reliable-vote-transmission.md @@ -0,0 +1,124 @@ +# Reliable Vote Transmission + +Validator votes are messages that have a critical function for consensus and +continuous operation of the network. Therefore it is critical that they are +reliably delivered and encoded into the ledger. + +## Challenges + +1. Leader rotation is triggered by PoH, which is clock with high drift. So many +nodes are likely to have an incorrect view if the next leader is active in +realtime or not. + +2. The next leader may be easily be flooded. Thus a DDOS would not only prevent +delivery of regular transactions, but also consensus messages. + +3. UDP is unreliable, and our asynchronous protocol requires any message that is +transmitted to be retransmitted until it is observed in the ledger. +Retransmittion could potentially cause an unintentional *thundering herd* +against the leader with a large number of validators. Worst case flood would be +`(num_nodes * num_retransmits)`. + +4. Tracking if the vote has been transmitted or not via the ledger does not +guarantee it will appear in a confirmed block. The current observed block may +be unrolled. Validators would need to maintain state for each vote and fork. + + +## Design + +1. Send votes as a push message through gossip. This ensures delivery of the +vote to all the next leaders, not just the next future one. + +2. Leaders will read the Crds table for new votes and encode any new received +votes into the blocks they propose. This allows for validator votes to be +included in rollback forks by all the future leaders. + +3. Validators that receive votes in the ledger will add them to their local crds +table, not as a push request, but simply add them to the table. This shortcuts +the push message protocol, so the validation messages do not need to be +retransmitted twice around the network. + +4. CrdsValue for vote should look like this ``` Votes(Vec) ``` + +Each vote transaction should maintain a `wallclock` in its userdata. The merge +strategy for Votes will keep the last N set of votes as configured by the local +client. For push/pull the vector is traversed recursively and each Transaction +is treated as an individual CrdsValue with its own local wallclock and +signature. + +Gossip is designed for efficient propagation of state. Messages that are sent +through gossip-push are batched and propagated with a minimum spanning tree to +the rest of the network. Any partial failures in the tree are actively repaired +with the gossip-pull protocol while minimizing the amount of data transfered +between any nodes. + + +## How this design solves the Challenges + +1. Because there is no easy way for validators to be in sync with leaders on the +leader's "active" state, gossip allows for eventual delivery regardless of that +state. + +2. Gossip will deliver the messages to all the subsequent leaders, so if the +current leader is flooded the next leader would have already received these +votes and is able to encode them. + +3. Gossip minimizes the number of requests through the network by maintaining an +efficient spanning tree, and using bloom filters to repair state. So retransmit +back-off is not necessary and messages are batched. + +4. Leaders that read the crds table for votes will encode all the new valid +votes that appear in the table. Even if this leader's block is unrolled, the +next leader will try to add the same votes without any additional work done by +the validator. Thus ensuring not only eventual delivery, but eventual encoding +into the ledger. + + +## Performance + +1. Worst case propagation time to the next leader is Log(N) hops with a base +depending on the fanout. With our current default fanout of 6, it is about 6 +hops to 20k nodes. + +2. The leader should receive 20k validation votes aggregated by gossip-push into +64kb blobs. Which would reduce the number of packets for 20k network to 80 +blobs. + +3. Each validators votes is replicated across the entire network. To maintain a +queue of 5 previous votes the Crds table would grow by 25 megabytes. `(20,000 +nodes * 256 bytes * 5)`. + +## Two step implementation rollout + +Initially the network can perform reliably with just 1 vote transmitted and +maintained through the network with the current Vote implementation. For small +networks a fanout of 6 is sufficient. With small network the memory and push +overhead is minor. + +### Sub 1k validator network + +1. Crds just maintains the validators latest vote. + +2. Votes are pushed and retransmitted regardless if they are appearing in the +ledger. + +3. Fanout of 6. + +* Worst case 256kb memory overhead per node. +* Worst case 4 hops to propagate to every node. +* Leader should receive the entire validator vote set in 4 push message blobs. + +### Sub 20k network + +Everything above plus the following: + +1. CRDS table maintains a vector of 5 latest validator votes. + +2. Votes encode a wallclock. CrdsValue::Votes is a type that recurses into the +transaction vector for all the gossip protocols. + +3. Increase fanout to 20. + +* Worst case 25mb memory overhead per node. +* Sub 4 hops worst case to deliver to the entire network. +* 80 blobs received by the leader for all the validator messages.