Reduce Avalanche redundancy and implement traditional fanout (#4174)

* Reduce Avalanche redundancy and implement traditional fanout

* Revert tiny fanout

* Update diagrams and docs based on review comments
This commit is contained in:
Sagar Dhawan
2019-05-07 13:24:58 -07:00
committed by GitHub
parent 4f3b22d04e
commit 2107e15bd3
8 changed files with 218 additions and 214 deletions

View File

@@ -0,0 +1,19 @@
+------------------------------------------------------------------+
| |
| +-----------------+ Neighborhood 0 +-----------------+ |
| | +--------------------->+ | |
| | Validator 1 | | Validator 2 | |
| | +<---------------------+ | |
| +--------+-+------+ +------+-+--------+ |
| | | | | |
| | +-----------------------------+ | | |
| | +------------------------+------+ | |
| | | | | |
+------------------------------------------------------------------+
| | | |
v v v v
+---------+------+---+ +-+--------+---------+
| | | |
| Neighborhood 1 | | Neighborhood 2 |
| | | |
+--------------------+ +--------------------+

View File

@@ -0,0 +1,15 @@
+--------------+
| |
+------------+ Leader +------------+
| | | |
| +--------------+ |
v v
+------------+----------------------------------------+------------+
| |
| +-----------------+ Neighborhood 0 +-----------------+ |
| | +--------------------->+ | |
| | Validator 1 | | Validator 2 | |
| | +<---------------------+ | |
| +-----------------+ +-----------------+ |
| |
+------------------------------------------------------------------+

View File

@@ -1,28 +1,18 @@
+--------------+
| |
+------------+ Leader +------------+
| | | |
| +--------------+ |
v v
+--------+--------+ +--------+--------+
| +--------------------->+ |
+-----------------+ Validator 1 | | Validator 2 +-------------+
| | +<---------------------+ | |
| +------+-+-+------+ +---+-+-+---------+ |
| | | | | | | |
| | | | | | | |
| +---------------------------------------------+ | | |
| | | | | | | |
| | | | | +----------------------+ | |
| | | | | | | |
| | | | +--------------------------------------------+ |
| | | | | | | |
| | | +----------------------+ | | |
| | | | | | | |
v v v v v v v v
+--------------------+ +--------------------+ +--------------------+ +--------------------+
| | | | | | | |
| Neighborhood 1 | | Neighborhood 2 | | Neighborhood 3 | | Neighborhood 4 |
| | | | | | | |
+--------------------+ +--------------------+ +--------------------+ +--------------------+
+--------------------+
| |
+--------+ Neighborhood 0 +----------+
| | | |
| +--------------------+ |
v v
+---------+----------+ +----------+---------+
| | | |
| Neighborhood 1 | | Neighborhood 2 |
| | | |
+---+-----+----------+ +----------+-----+---+
| | | |
v v v v
+------------------+-+ +-+------------------+ +------------------+-+ +-+------------------+
| | | | | | | |
| Neighborhood 3 | | Neighborhood 4 | | Neighborhood 5 | | Neighborhood 6 |
| | | | | | | |
+--------------------+ +--------------------+ +--------------------+ +--------------------+

View File

@@ -5,16 +5,15 @@ broadcast transaction blobs to all nodes in a very quick and efficient manner.
In order to establish the fanout, the cluster divides itself into small
collections of nodes, called *neighborhoods*. Each node is responsible for
sharing any data it receives with the other nodes in its neighborhood, as well
as propagating the data on to a small set of nodes in other neighborhoods.
as propagating the data on to a small set of nodes in other neighborhoods.
This way each node only has to communicate with a small number of nodes.
During its slot, the leader node distributes blobs between the validator nodes
in one neighborhood (layer 1). Each validator shares its data within its
neighborhood, but also retransmits the blobs to one node in each of multiple
neighborhoods in the next layer (layer 2). The layer-2 nodes each share their
data with their neighborhood peers, and retransmit to nodes in the next layer,
etc, until all nodes in the cluster have received all the blobs.
<img alt="Two layer cluster" src="img/data-plane.svg" class="center"/>
in the first neighborhood (layer 0). Each validator shares its data within its
neighborhood, but also retransmits the blobs to one node in some neighborhoods
in the next layer (layer 1). The layer-1 nodes each share their data with their
neighborhood peers, and retransmit to nodes in the next layer, etc, until all
nodes in the cluster have received all the blobs.
## Neighborhood Assignment - Weighted Selection
@@ -23,48 +22,50 @@ cluster is divided into neighborhoods. To achieve this, all the recognized
validator nodes (the TVU peers) are sorted by stake and stored in a list. This
list is then indexed in different ways to figure out neighborhood boundaries and
retransmit peers. For example, the leader will simply select the first nodes to
make up layer 1. These will automatically be the highest stake holders, allowing
the heaviest votes to come back to the leader first. Layer-1 and lower-layer
nodes use the same logic to find their neighbors and lower layer peers.
make up layer 0. These will automatically be the highest stake holders, allowing
the heaviest votes to come back to the leader first. Layer-0 and lower-layer
nodes use the same logic to find their neighbors and next layer peers.
## Layer and Neighborhood Structure
The current leader makes its initial broadcasts to at most `DATA_PLANE_FANOUT`
nodes. If this layer 1 is smaller than the number of nodes in the cluster, then
nodes. If this layer 0 is smaller than the number of nodes in the cluster, then
the data plane fanout mechanism adds layers below. Subsequent layers follow
these constraints to determine layer-capacity: Each neighborhood contains
`NEIGHBORHOOD_SIZE` nodes and each layer may have up to `DATA_PLANE_FANOUT/2`
neighborhoods.
`DATA_PLANE_FANOUT` nodes. Layer-0 starts with 1 neighborhood with fanout nodes.
The number of nodes in each additional layer grows by a factor of fanout.
As mentioned above, each node in a layer only has to broadcast its blobs to its
neighbors and to exactly 1 node in each next-layer neighborhood, instead of to
every TVU peer in the cluster. In the default mode, each layer contains
`DATA_PLANE_FANOUT/2` neighborhoods. The retransmit mechanism also supports a
second, `grow`, mode of operation that squares the number of neighborhoods
allowed each layer. This dramatically reduces the number of layers needed to
support a large cluster, but can also have a negative impact on the network
pressure on each node in the lower layers. A good way to think of the default
mode (when `grow` is disabled) is to imagine it as chain of layers, where the
leader sends blobs to layer-1 and then layer-1 to layer-2 and so on, the `layer
capacities` remain constant, so all layers past layer-2 will have the same
number of nodes until the whole cluster is covered. When `grow` is enabled, this
becomes a traditional fanout where layer-3 will have the square of the number of
nodes in layer-2 and so on.
neighbors and to exactly 1 node in some next-layer neighborhoods,
instead of to every TVU peer in the cluster. A good way to think about this is,
layer-0 starts with 1 neighborhood with fanout nodes, layer-1 adds "fanout"
neighborhoods, each with fanout nodes and layer-2 will have
`fanout * number of nodes in layer-1` and so on.
This way each node only has to communicate with a maximum of `2 * DATA_PLANE_FANOUT - 1` nodes.
The following diagram shows how the Leader sends blobs with a Fanout of 2 to
Neighborhood 0 in Layer 0 and how the nodes in Neighborhood 0 share their data
with each other.
<img alt="Leader sends blobs to Neighborhood 0 in Layer 0" src="img/data-plane-seeding.svg" class="center"/>
The following diagram shows how Neighborhood 0 fans out to Neighborhoods 1 and 2.
<img alt="Neighborhood 0 Fanout to Neighborhood 1 and 2" src="img/data-plane-fanout.svg" class="center"/>
Finally, the following diagram shows a two layer cluster with a Fanout of 2.
<img alt="Two layer cluster with a Fanout of 2" src="img/data-plane.svg" class="center"/>
#### Configuration Values
`DATA_PLANE_FANOUT` - Determines the size of layer 1. Subsequent
layers have `DATA_PLANE_FANOUT/2` neighborhoods when `grow` is inactive.
`NEIGHBORHOOD_SIZE` - The number of nodes allowed in a neighborhood.
`DATA_PLANE_FANOUT` - Determines the size of layer 0. Subsequent
layers grow by a factor of `DATA_PLANE_FANOUT`.
The number of nodes in a neighborhood is equal to the fanout value.
Neighborhoods will fill to capacity before new ones are added, i.e if a
neighborhood isn't full, it _must_ be the last one.
`GROW_LAYER_CAPACITY` - Whether or not retransmit should be behave like a
_traditional fanout_, i.e if each additional layer should have growing
capacities. When this mode is disabled (default), all layers after layer 1 have
the same capacity, keeping the network pressure on all nodes equal.
Currently, configuration is set when the cluster is launched. In the future,
these parameters may be hosted on-chain, allowing modification on the fly as the
cluster sizes change.
@@ -72,13 +73,10 @@ cluster sizes change.
## Neighborhoods
The following diagram shows how two neighborhoods in different layers interact.
What this diagram doesn't capture is that each neighbor actually receives
blobs from one validator per neighborhood above it. This means that, to
cripple a neighborhood, enough nodes (erasure codes +1 per neighborhood) from
the layer above need to fail. Since multiple neighborhoods exist in the upper
layer and a node will receive blobs from a node in each of those neighborhoods,
we'd need a big network failure in the upper layers to end up with incomplete
data.
To cripple a neighborhood, enough nodes (erasure codes +1) from the neighborhood
above need to fail. Since each neighborhood receives blobs from multiple nodes
in a neighborhood in the upper layer, we'd need a big network failure in the upper
layers to end up with incomplete data.
<img alt="Inner workings of a neighborhood"
src="img/data-plane-neighborhood.svg" class="center"/>