Move data plane fanout chapter out of proposals
This commit is contained in:
@ -15,6 +15,7 @@
|
|||||||
- [Synchronization](synchronization.md)
|
- [Synchronization](synchronization.md)
|
||||||
- [Leader Rotation](leader-rotation.md)
|
- [Leader Rotation](leader-rotation.md)
|
||||||
- [Fork Generation](fork-generation.md)
|
- [Fork Generation](fork-generation.md)
|
||||||
|
- [Data Plane Fanout](data-plane-fanout.md)
|
||||||
|
|
||||||
- [Anatomy of a Fullnode](fullnode.md)
|
- [Anatomy of a Fullnode](fullnode.md)
|
||||||
- [TPU](tpu.md)
|
- [TPU](tpu.md)
|
||||||
@ -34,7 +35,6 @@
|
|||||||
- [Secure Vote Signing](vote-signer.md)
|
- [Secure Vote Signing](vote-signer.md)
|
||||||
- [Staking Rewards](staking-rewards.md)
|
- [Staking Rewards](staking-rewards.md)
|
||||||
- [Fork Selection](fork-selection.md)
|
- [Fork Selection](fork-selection.md)
|
||||||
- [Data Plane Fanout](data-plane-fanout.md)
|
|
||||||
- [Reliable Vote Transmission](reliable-vote-transmission.md)
|
- [Reliable Vote Transmission](reliable-vote-transmission.md)
|
||||||
- [Bank Forks](bank-forks.md)
|
- [Bank Forks](bank-forks.md)
|
||||||
- [Cluster Economics](ed_overview.md)
|
- [Cluster Economics](ed_overview.md)
|
||||||
|
@ -1,48 +1,11 @@
|
|||||||
# Data Plane Fanout
|
# Data Plane Fanout
|
||||||
|
|
||||||
This article describes the current single-layer broadcast and retransmit
|
The the cluster organizes itself by stake and divides into a collection
|
||||||
mechanisms as well as proposed changes to add a multi-layer retransmit via an
|
|
||||||
Avalanche mechanism.
|
|
||||||
|
|
||||||
## Current Design
|
|
||||||
|
|
||||||
There's two basic parts to the current data plane's fanout design.
|
|
||||||
|
|
||||||
#### Broadcast Service
|
|
||||||
|
|
||||||
In this service, the leader distributes its data across the Layer-1 nodes.
|
|
||||||
Currently, Layer-1 nodes are all known "TVU peers" (`ClusterInfo::tvu_peers`).
|
|
||||||
The leader performs a round-robin broadcast where it sends each blob of data to
|
|
||||||
only one validator at a time. That way each Layer-1 node only receives partial
|
|
||||||
data from the leader and the Retransmit Stage in each Layer-1 node's TVU will
|
|
||||||
ensure all data is shared between its Layer-1 peers and a complete window is
|
|
||||||
received.
|
|
||||||
|
|
||||||
#### Retransmit Stage
|
|
||||||
|
|
||||||
The Retransmit stage *forwards* data from a Layer-1 node to all of _its_
|
|
||||||
"retransmit peers" (list of TVU peers excluding the leader). So as nodes start
|
|
||||||
seeing complete windows they can send their votes back to the leader.
|
|
||||||
Validators know to only forward blobs that came from the leader by checking the
|
|
||||||
signatures against the current leader.
|
|
||||||
|
|
||||||
**Cluster_info -> retransmit** = Used by TVUs to retransmit. Currently Layer-1
|
|
||||||
sends this to all TVU peers, leader is automatically excluded.
|
|
||||||
|
|
||||||
**Cluster_info -> broadcast** = Used by leader to broadcast to layer-1 nodes.
|
|
||||||
|
|
||||||
**BroadcastService -> run** = Used by leader (TPU) to broadcast to all
|
|
||||||
validators. Currently all TVU Peers are considered layer-1. But blobs are
|
|
||||||
transmitted sort of round robin. See Cluster_info->Broadcast.
|
|
||||||
|
|
||||||
## Proposed Design
|
|
||||||
|
|
||||||
The new design organizes the network by stake and divides it into a collection
|
|
||||||
of nodes, called `neighborhoods`. The leader broadcasts its blobs to the
|
of nodes, called `neighborhoods`. The leader broadcasts its blobs to the
|
||||||
layer-1 (neighborhood 0) nodes exactly like it does without this mechanism. The
|
layer-1 (neighborhood 0) nodes exactly like it does without this mechanism. The
|
||||||
main difference being the number of nodes in layer-1 is capped via the
|
main difference being the number of nodes in layer-1 is capped via the
|
||||||
configurable `DATA_PLANE_FANOUT`. If the fanout is smaller than the nodes in
|
configurable `DATA_PLANE_FANOUT`. If the fanout is smaller than the nodes in
|
||||||
the network then the mechanism will add layers below layer-1. Subsequent layers
|
the cluster then the mechanism will add layers below layer-1. Subsequent layers
|
||||||
(beyond layer-1) follow the following constraints to determine layer-capacity.
|
(beyond layer-1) follow the following constraints to determine layer-capacity.
|
||||||
Each neighborhood has `NEIGHBORHOOD_SIZE` nodes and `fanout/2` neighborhoods
|
Each neighborhood has `NEIGHBORHOOD_SIZE` nodes and `fanout/2` neighborhoods
|
||||||
are allowed per layer.
|
are allowed per layer.
|
||||||
@ -54,21 +17,21 @@ peers). This means any node has to only send its data to its neighbors and
|
|||||||
each neighborhood in the layer below instead of every single TVU peer it has.
|
each neighborhood in the layer below instead of every single TVU peer it has.
|
||||||
The retransmit mechanism also supports a second, `grow`, mode of operation
|
The retransmit mechanism also supports a second, `grow`, mode of operation
|
||||||
that squares the number of neighborhoods allowed per layer which dramatically
|
that squares the number of neighborhoods allowed per layer which dramatically
|
||||||
reduces the number of layers needed to support a large network but can also
|
reduces the number of layers needed to support a large cluster but can also
|
||||||
have a negative impact on the network pressure each node in the lower layers
|
have a negative impact on the network pressure each node in the lower layers
|
||||||
has to deal with. A good way to think of the default mode (when `grow` is
|
has to deal with. A good way to think of the default mode (when `grow` is
|
||||||
disabled) is to imagine it as `chain` of layers where the leader sends blobs to
|
disabled) is to imagine it as `chain` of layers where the leader sends blobs to
|
||||||
layer-1 and then layer-1 to layer-2 and so on, but instead of growing layer-3
|
layer-1 and then layer-1 to layer-2 and so on, but instead of growing layer-3
|
||||||
to the square of number of nodes in layer-2, we keep the `layer capacities`
|
to the square of number of nodes in layer-2, we keep the `layer capacities`
|
||||||
constant, so all layers past layer-2 will have the same number of nodes until
|
constant, so all layers past layer-2 will have the same number of nodes until
|
||||||
the whole network is covered. When `grow` is enabled, this quickly turns into a
|
the whole cluster is covered. When `grow` is enabled, this quickly turns into a
|
||||||
traditional fanout where layer-3 will have the square of the number of nodes in
|
traditional fanout where layer-3 will have the square of the number of nodes in
|
||||||
layer-2 and so on.
|
layer-2 and so on.
|
||||||
|
|
||||||
Below is an example of a two layer network. Note - this example doesn't
|
Below is an example of a two layer cluster. Note - this example doesn't
|
||||||
describe the same `fanout/2` limit for lower layer neighborhoods.
|
describe the same `fanout/2` limit for lower layer neighborhoods.
|
||||||
|
|
||||||
<img alt="Two layer network" src="img/data-plane.svg" class="center"/>
|
<img alt="Two layer cluster" src="img/data-plane.svg" class="center"/>
|
||||||
|
|
||||||
#### Neighborhoods
|
#### Neighborhoods
|
||||||
|
|
||||||
@ -87,7 +50,7 @@ src="img/data-plane-neighborhood.svg" class="center"/>
|
|||||||
#### A Weighted Selection Mechanism
|
#### A Weighted Selection Mechanism
|
||||||
|
|
||||||
To support this mechanism, there needs to be a agreed upon way of dividing the
|
To support this mechanism, there needs to be a agreed upon way of dividing the
|
||||||
network amongst the nodes. To achieve this the `tvu_peers` are sorted by stake
|
cluster amongst the nodes. To achieve this the `tvu_peers` are sorted by stake
|
||||||
and stored in a list. This list can then be indexed in different ways to figure
|
and stored in a list. This list can then be indexed in different ways to figure
|
||||||
out neighborhood boundaries and retransmit peers. For example, the leader will
|
out neighborhood boundaries and retransmit peers. For example, the leader will
|
||||||
simply select the first `DATA_PLANE_FANOUT` nodes as its layer 1 nodes. These
|
simply select the first `DATA_PLANE_FANOUT` nodes as its layer 1 nodes. These
|
||||||
@ -113,7 +76,7 @@ lower layer neighborhood.
|
|||||||
|
|
||||||
Each node can receive blobs froms its peer in the layer above as well as its
|
Each node can receive blobs froms its peer in the layer above as well as its
|
||||||
neighbors. As long as the failure rate is less than the number of erasure
|
neighbors. As long as the failure rate is less than the number of erasure
|
||||||
codes, blobs can be repaired without the network failing.
|
codes, blobs can be repaired without the cluster failing.
|
||||||
|
|
||||||
#### Constraints
|
#### Constraints
|
||||||
|
|
||||||
@ -130,5 +93,5 @@ capacities. When this mode is disabled (default) all layers after layer 1 have
|
|||||||
the same capacity to keep the network pressure on all nodes equal.
|
the same capacity to keep the network pressure on all nodes equal.
|
||||||
|
|
||||||
Future work would involve moving these parameters to on chain configuration
|
Future work would involve moving these parameters to on chain configuration
|
||||||
since it might be beneficial tune these on the fly as the network sizes change.
|
since it might be beneficial tune these on the fly as the cluster sizes change.
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user