Commit Graph

7 Commits

Author SHA1 Message Date
behzad nouri
fb69f45f14 adds fallback & metric for when epoch staked-nodes are none 2021-08-05 21:47:33 +00:00
behzad nouri
50d0e830c9 unifies cluster-nodes computation & caching across turbine stages
Broadcast-stage is using epoch_staked_nodes based on the same slot that
shreds belong to:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

But retransmit-stage is using bank-epoch of the working-bank:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289

So the two are not consistent at epoch boundaries where some nodes may
have a working bank (or similarly a root bank) lagging other nodes. As a
result the node which obtains a packet may construct turbine broadcast
tree inconsistently with its parent node in the tree and so some packets
may fail to reach all nodes in the tree.
2021-08-05 21:47:33 +00:00
behzad nouri
ecc1c7957f implements cluster-nodes cache
Cluster nodes are cached keyed by the respective epoch from which stakes
are obtained, and so if epoch changes cluster-nodes will be recomputed.

A time-to-live eviction policy is enforced to refresh entries in case
gossip contact-infos are updated.
2021-08-05 21:47:33 +00:00
behzad nouri
d2d5f36a3c adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
carllin
588c0464b8 Add sampling logic and DuplicateSlotRepairStatus module (#18721) 2021-07-21 11:15:08 -07:00
behzad nouri
cf31afdd6a makes CrdsGossip thread-safe (#18615) 2021-07-14 22:27:17 +00:00
behzad nouri
04787be8b1 encapsulates turbine peers computations of broadcast & retransmit stages (#18238)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
2021-07-07 00:35:25 +00:00