mergify[bot]
5057aaddc0
Send votes to next leader's TPU instead of our TPU ( #16663 )
...
(cherry picked from commit c8b474cd0b )
Co-authored-by: Michael Vines <mvines@gmail.com >
2021-04-20 08:45:58 +00:00
Michael Vines
a1b0f2f681
Increase test timeout
2021-04-19 04:12:16 +00:00
mergify[bot]
719db7eed0
uses timeouts based on stake for filtering pull responses ( #16549 ) ( #16551 )
...
filter_pull_responses is using default timeout when discarding pull
responses (except for ContactInfo):
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L349-L350
But purging code uses timeouts based on stake:
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L1867-L1870
So the crds value will not be purged from the sender's table and will be
sent again over the next pull request.
(cherry picked from commit d92721aab9 )
Co-authored-by: behzad nouri <behzadnouri@gmail.com >
2021-04-14 21:43:48 +00:00
mergify[bot]
4ddb72a32d
prioritizes contact-infos in pull responses ( #16541 ) ( #16550 )
...
Expired crds values where the contact-info does not exist are wasted:
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L353-L378
and then are sent again over the next pull-request.
Also, the stake of the first response (which can be anything) is used to
weight all pull-responses to a node, while the rest of responses can
have different stake.
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L2231
(cherry picked from commit f35a6a8be0 )
Co-authored-by: behzad nouri <behzadnouri@gmail.com >
2021-04-14 20:14:22 +00:00
Justin Starry
579065443a
v1.6: Use blake3 message hash in status cache ( #16507 )
2021-04-13 16:57:20 +08:00
mergify[bot]
79ee0e06b2
Cluster info shred spies (bp #16389 ) ( #16395 )
...
* cluster-info: Don't subtract non-shred spies from node count
(cherry picked from commit b6b08706b9 )
* cluster-info: Get rid of some integer math while we're here
(cherry picked from commit b71875df61 )
Co-authored-by: Trent Nelson <trent@solana.com >
2021-04-06 01:37:16 +00:00
mergify[bot]
8f852d8a6b
makes test_pull_request_time_pruning smaller ( #16128 ) ( #16144 )
...
(cherry picked from commit b041b55028 )
Co-authored-by: behzad nouri <behzadnouri@gmail.com >
2021-03-26 01:20:26 +00:00
mergify[bot]
7475a6f444
makes turbine peer computation consistent between broadcast and retransmit ( #14910 ) ( #16143 )
...
get_broadcast_peers is using tvu_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370
which is potentially inconsistent with retransmit_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345
Also, the leader does not include its own contact-info when broadcasting
shreds:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324
but on the retransmit side, slot leader is removed only _after_ neighbors and
children are computed:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384
So the turbine broadcast tree is different between the two stages.
This commit:
* Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers
consistently.
* Retransmit stage removes slot leader _before_ computing children and
neighbors.
(cherry picked from commit 570fd3f810 )
Co-authored-by: behzad nouri <behzadnouri@gmail.com >
2021-03-26 00:16:48 +00:00
mergify[bot]
dd2d25d698
limits CrdsGossipPull::pull_request_time size ( #15793 ) ( #16097 )
...
There is no pruning logic on CrdsGossipPull::pull_request_time
https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_pull.rs#L172-L174
potentially allowing this to take too much memory.
Additionally, CrdsGossipPush::last_pushed_to is pruning recent push
timestamps:
https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_push.rs#L275-L279
instead of the older ones.
Co-authored-by: Nathan Hawkins <utsl@utsl.org >
(cherry picked from commit a6c23648cb )
Co-authored-by: behzad nouri <behzadnouri@gmail.com >
2021-03-24 20:05:04 +00:00
behzad nouri
f2865dfd63
requires stakes for propagating crds values through gossip ( #15561 )
2021-03-12 15:50:14 +00:00
behzad nouri
56923c91bf
limits number of unique pubkeys in the crds table ( #15539 )
2021-03-10 20:46:05 +00:00
behzad nouri
5a9896706c
indexes epoch slots in crds table ( #15459 )
...
ClusterInfo::get_epoch_slots_since scans the entire crds table to obtain
epoch-slots inserted since a timestamp:
https://github.com/solana-labs/solana/blob/013daa8f4/core/src/cluster_info.rs#L1245-L1262
The alternative is to index epoch-slots in crds table ordered by their
insert timestamp.
2021-02-26 14:12:04 +00:00
carllin
c2e8814dce
Add limit and shrink policy for recycler ( #15320 )
2021-02-24 00:15:58 -08:00
Michael Vines
5df36aec7d
Pacify clippy
2021-02-19 20:08:41 -08:00
behzad nouri
aa3aac766f
adds metrics for inbound/outbound gossip packets counts ( #15407 )
2021-02-19 22:49:35 +00:00
behzad nouri
076c20f1ca
checks that prune-messages have the same inner/outer pubkey ( #15352 )
2021-02-16 21:06:18 +00:00
behzad nouri
0ad063f4e9
adds flag to disable duplicate instance check ( #15006 )
2021-02-03 16:26:17 +00:00
dependabot[bot]
1df93fa2be
chore: bump serde from 1.0.112 to 1.0.118 ( #14828 )
...
* chore: bump serde from 1.0.112 to 1.0.122
Bumps [serde](https://github.com/serde-rs/serde ) from 1.0.112 to 1.0.122.
- [Release notes](https://github.com/serde-rs/serde/releases )
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.122 )
Signed-off-by: dependabot[bot] <support@github.com >
* [auto-commit] Update all Cargo lock files
* Update frozen_abi digest following serde update
* Revert "chore: bump serde from 1.0.112 to 1.0.122"
This reverts commit a3ef4442a4 .
* Revert "[auto-commit] Update all Cargo lock files"
This reverts commit c41c3b005f .
* chore: bump serde from 1.0.112 to 1.0.118
Bumps [serde](https://github.com/serde-rs/serde ) from 1.0.112 to 1.0.118.
- [Release notes](https://github.com/serde-rs/serde/releases )
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.118 )
Signed-off-by: dependabot[bot] <support@github.com >
* [auto-commit] Update all Cargo lock files
* Remove serum-dex pinning
* blind commit!
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <dependabot-buildkite@noreply.solana.com >
Co-authored-by: Ryo Onodera <ryoqun@gmail.com >
2021-02-02 23:28:16 +09:00
behzad nouri
e1021d9f83
removes redundant epoch stakes cache in retransmit ( #14781 )
...
Following d6d76219b , staked nodes computed from vote accounts are
already cached in runtime::Stakes, so the caching in retransmit_stage is
redundant.
2021-01-24 21:15:09 +00:00
behzad nouri
491b059755
broadcasts duplicate shreds through gossip ( #14699 )
2021-01-24 15:47:43 +00:00
behzad nouri
8e581601d6
patches crds vote-index assignment bug ( #14438 )
...
If tower is full, old votes are evicted from the front of the deque:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373
whereas recent votes if expire are evicted from the back:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537
As a result, from a single tower_index scalar, we cannot infer which crds-vote
should be overwritten:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576
In addition there is an off by one bug in the existing code. tower_index is
bounded by MAX_LOCKOUT_HISTORY - 1:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382
So, it is at most 30, whereas MAX_VOTES is 32:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29
Which means that this branch is never taken:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593
so crds table alwasys keeps 29 **oldest** votes by wallclock, and then
only overrides the 30st one each time. (i.e a tally of only two most
recent votes).
2021-01-21 13:08:07 +00:00
behzad nouri
b5fd0ed859
rewrites turbine retransmit peers computation ( #14584 )
2021-01-19 04:18:47 +00:00
Michael Vines
9ddd6f08e8
Persist gossip contact info
2020-12-27 20:46:54 -08:00
behzad nouri
2fd38d9912
indexes votes in crds table ( #14272 )
2020-12-27 13:31:05 +00:00
behzad nouri
49019c6613
obtains staked-nodes from the root-bank ( #14257 )
...
... as opposed to the working bank
2020-12-27 13:28:05 +00:00
Michael Vines
ace360ade2
Multiple entrypoint support
2020-12-22 18:35:31 -08:00
Michael Vines
3373082ffa
Update entrypoint contact info even when shred version adoption is not requested
2020-12-22 18:35:31 -08:00
behzad nouri
a14cfd660a
removes &Arc<Self> receivers ( #14234 )
2020-12-22 23:51:53 +00:00
behzad nouri
691031fefd
limits number of crds values returned when responding to pull requests ( #13739 )
...
Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
2020-12-18 18:45:12 +00:00
behzad nouri
6a3797e164
adds crds-value for broadcasting duplicate shreds through gossip ( #14133 )
...
In gossip, the header overhead we get from:
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/cluster_info.rs#L434-L435
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L31-L36
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L73
already exceeds SIZE_OF_NONCE in shreds. We also need aditional
meta-data (wallclock, source pubkey, ...). Which means that given the
SHRED_PAYLOAD_SIZE, we cannot fit all these in PACKET_DATA_SIZE:
https://github.com/solana-labs/solana/blob/de9ac43eb/ledger/src/shred.rs#L80
On top of that, we need 2 shred payloads as the proof of duplicate. So
each DuplicateShred crds value includes only a chunk of the payload,
along with the meta-data to reconstruct the full payload from the chunks
on the receiving end.
2020-12-18 14:32:43 +00:00
behzad nouri
d6d76219b6
caches staked nodes computed from vote-accounts ( #13929 )
2020-12-17 21:22:50 +00:00
Michael Vines
7143aaa89b
Clippy
2020-12-14 08:03:29 -08:00
behzad nouri
409fe3bca1
adds the instance token to crds-labels for node-instance crds-values ( #14037 )
...
If a node "a" receives instance-info from node "b1" it will override any
instance-info associated with "b1" pubkey in its crds table. This makes
it less likely that when "b1" receives crds values from "a" (either
through pull or push), it sees other instances of itself (because node
"a" discarded them when it received "b1" instance info).
In order for the crds table to contain all instance-info associated with
the same pubkey at the same time, we need to add the instance tokens to
the keys in the crds table (i.e. the CrdsValueLabel).
2020-12-10 17:01:55 +00:00
behzad nouri
1d267eae6b
std::process::exit to kill all threads
2020-12-09 10:24:23 -08:00
behzad nouri
895d7d6a65
removes RwLock on ClusterInfo.instance
2020-12-09 10:24:23 -08:00
behzad nouri
542198180a
pushes node-instance along with version early in gossip
2020-12-09 10:24:23 -08:00
behzad nouri
8cd5eb9863
checks for duplicate validator instances using gossip
2020-12-09 10:24:23 -08:00
behzad nouri
6706f2b3bb
removes recursive read-locks on gossip ( #13973 )
...
ClusterInfo::tvu_peers acquires a read-lock on gossip:
https://github.com/solana-labs/solana/blob/f0e934145/core/src/cluster_info.rs#L1171-L1185
and so, ClusterInfo::repair_peers is recursively locking gossip for
read twice:
https://github.com/solana-labs/solana/blob/f0e934145/core/src/cluster_info.rs#L1202-L1223
But std::sync::RwLock is not re-entrant (recursive).
2020-12-06 15:14:49 +00:00
behzad nouri
c3048b451d
samples repair peers using WeightedIndex ( #13919 )
...
To output one random sample, weighted_best generates n random numbers:
https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/weighted_shuffle.rs#L38-L63
WeightedIndex does so with only one random number:
https://github.com/rust-random/rand/blob/eb02f0e46/src/distributions/weighted_index.rs#L223-L240
Additionally, if the index is already constructed, it only does a total
of O(log(n)) amount of work; which can be achieved if RepairCache,
caches the weighted index:
https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/serve_repair.rs#L83
Also, the repair-peers code can be reorganized to have fewer redundant
unlock-then-lock code.
2020-12-03 14:26:07 +00:00
Tyera Eulberg
10c81a2448
Remove rpc_banks from validator ( #13882 )
...
* Remove rpc_banks from validator
* Bump abi-digest
2020-12-02 03:25:09 +00:00
behzad nouri
26bf2b7e45
processes pull-request callers only once per unique caller ( #13750 )
...
process_pull_requests acquires a write lock on crds table to update
records timestamp for each of the pull-request callers:
https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300
However, pull-requests overlap a lot in callers and this function ends
up doing a lot of redundant duplicate work.
This commit obtains unique callers before acquiring an exclusive lock on
crds table.
2020-11-22 17:51:14 +00:00
sakridge
c1eb350c47
Allow contact debug interval to be adjusted ( #13737 )
2020-11-20 14:47:37 -08:00
behzad nouri
b58f69297f
makes crds fields private ( #13703 )
...
Crds fields should maintain several invariants between themselves, so
exposing them as public fields can be bug prone. In addition these
invariants are asserted on every write:
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262
which adds extra instructions and is not optimal. Should these fields be
private the asserts will be redundant.
2020-11-19 20:57:40 +00:00
behzad nouri
1ffab5de77
breaks prunes data into chunks to fit into packets ( #13613 )
...
Validator logs show that prune messages are dropped because they exceed
packet data size:
https://github.com/solana-labs/solana/blob/f25c969ad/perf/src/packet.rs#L90-L92
This can exacerbate gossip traffic by redundantly increasing push
messages across network. The workaround is to break prunes into smaller
chunks and send over in multiple messages.
2020-11-19 16:38:01 +00:00
behzad nouri
5e8490ab9d
packs more crds-values in a single gossip packet ( #13500 )
...
split_gossip_messages:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L1536-L1574
splits crds-values into chunks to fit into a gossip packet. However it is
using a global upper-bound for the header-size across all protocols:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L90-L93
This can be wasteful as the specific gossip protocol can have smaller
header than this upper-bound (e.g. Protocol::PushMessage is 170 bytes
smaller). Adding more crds-values in one gossip packet can avoid the
overheads of separate packets and reduce total number of bytes sent over
the wire.
This commit updates the splitting function to take a max-chunk-size
argument. At call-site, this value is set to the size of the protocol
which the values are sent over.
2020-11-15 18:23:59 +00:00
behzad nouri
cbea9ebc34
indexes nodes' contact infos in crds table ( #13553 )
...
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
2020-11-15 16:38:04 +00:00
behzad nouri
73ac104df2
propagates errors out of Packet::from_data ( #13445 )
...
Packet::from_data is ignoring serialization errors:
https://github.com/solana-labs/solana/blob/d08c3232e/sdk/src/packet.rs#L42-L48
This is likely never useful as the packet will be sent over the wire
taking bandwidth but at the receiving end will either fail to
deserialize or it will be invalid.
This commit will propagate the errors out of the function to the
call-site, allowing the call-site to handle the error.
2020-11-08 15:10:03 +00:00
behzad nouri
7f4debdad5
drops older gossip packets when load shedding ( #13364 )
...
Gossip drops incoming packets when overloaded:
https://github.com/solana-labs/solana/blob/f6a73098a/core/src/cluster_info.rs#L2462-L2475
However newer packets are dropped in favor of the older ones.
This is probably not ideal as newer packets are more likely to contain
more recent data, so dropping them will keep the validator state
lagging.
2020-11-05 17:14:28 +00:00
behzad nouri
8f0796436a
shares the lock on gossip when processing prune messages ( #13339 )
...
Processing prune messages acquires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L1824-L1825
This can be reduced to a shared lock if active-sets are changed to use
atomic bloom filters:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/crds_gossip_push.rs#L50
2020-11-05 15:42:00 +00:00
behzad nouri
118ce47b97
measures processing time of each kind of gossip packets ( #13366 )
2020-11-05 15:34:34 +00:00