Commit Graph

301 Commits

Author SHA1 Message Date
Michael Vines
a1ef2bd74d Ignore flaky test_pull_request_time_pruning 2021-04-21 12:07:36 -07:00
behzad nouri
37b8587d4e expands number of erasure coding shreds in the last batch in slots (#16484)
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719

Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714

However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.

This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.

As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
2021-04-21 12:47:50 +00:00
Tyera Eulberg
0924c2d070 Add port and gossip options to solana-test-validator (#16696) 2021-04-21 02:40:52 +00:00
behzad nouri
bc90e04e64 uses current local timestamp when recording purged values
CrdsGossipPull.purged_values is meant to record recently purged values
so that they are excluded from imminent pull requests, until the entire
cluster have synced to the updated value:
https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L449-L454

However, VersionedCrdsValue.local_timestamp represents the local time
when the value was last updated, and given that crds values may have
different timeouts based on stake, it does not necessarily represent how
recently the value was purged:
https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds.rs#L75-L76

As such, recording current local timestamp when purging values is more
appropriate. Additionally, purge_purged assumes that the purge_values is
sorted in timestamps when draining the old ones; which is not true if
those timestamps are VersionedCrdsValue.local_timestamp:
https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L563-L571
2021-04-20 11:21:00 +00:00
Michael Vines
c8b474cd0b Send votes to next leader's TPU instead of our TPU 2021-04-20 00:38:21 -07:00
Michael Vines
b06e93fe5b Increase test timeout 2021-04-18 20:55:02 -07:00
behzad nouri
e405747409 Revert "Add limit and shrink policy for recycler (#15320)"
This reverts commit c2e8814dce.
2021-04-18 19:29:24 +00:00
behzad nouri
d92721aab9 uses timeouts based on stake for filtering pull responses (#16549)
filter_pull_responses is using default timeout when discarding pull
responses (except for ContactInfo):
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L349-L350

But purging code uses timeouts based on stake:
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L1867-L1870

So the crds value will not be purged from the sender's table and will be
sent again over the next pull request.
2021-04-14 20:18:00 +00:00
behzad nouri
f35a6a8be0 prioritizes contact-infos in pull responses (#16541)
Expired crds values where the contact-info does not exist are wasted:
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L353-L378
and then are sent again over the next pull-request.

Also, the stake of the first response (which can be anything) is used to
weight all pull-responses to a node, while the rest of responses can
have different stake.
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L2231
2021-04-14 18:45:20 +00:00
Justin Starry
85eb37fab0 Merge pull request from GHSA-8v47-8c53-wwrc
* Track transaction check time separately from account loads

* banking packet process metrics

* Remove signature clone in status cache lookup

* Reduce allocations when converting packets to transactions

* Add blake3 hash of transaction messages in status cache

* Bug fixes

* fix tests and run fmt

* Address feedback

* fix simd tx entry verification

* Fix rebase

* Feedback

* clean up

* Add tests

* Remove feature switch and fall back to signature check

* Bump programs/bpf Cargo.lock

* clippy

* nudge benches

* Bump `BankSlotDelta` frozen ABI hash`

* Add blake3 to sdk/programs/Cargo.lock

* nudge bpf tests

* short circuit status cache checks

Co-authored-by: Trent Nelson <trent@solana.com>
2021-04-13 00:28:08 -06:00
Christian Drappi
54a04bac3d Apple M1 compatibility (#16346)
Co-authored-by: Christian Drappi <christiandrappi@Christians-MacBook-Pro.local>
2021-04-09 17:21:01 -07:00
behzad nouri
22a18a68e3 stops consuming pinned vectors with a recycler (#16441)
If the vector is pinned and has a recycler, From<PinnedVec>
implementation of Vec should clone (instead of consuming) the underlying
vector so that the next allocation of a PinnedVec will recycle an
already pinned one.
2021-04-09 16:55:24 +00:00
Trent Nelson
b71875df61 cluster-info: Get rid of some integer math while we're here 2021-04-06 00:09:37 +00:00
Trent Nelson
b6b08706b9 cluster-info: Don't subtract non-shred spies from node count 2021-04-06 00:09:37 +00:00
behzad nouri
b041b55028 makes test_pull_request_time_pruning smaller (#16128) 2021-03-25 22:44:43 +00:00
behzad nouri
a6c23648cb limits CrdsGossipPull::pull_request_time size (#15793)
There is no pruning logic on CrdsGossipPull::pull_request_time
https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_pull.rs#L172-L174
potentially allowing this to take too much memory.

Additionally, CrdsGossipPush::last_pushed_to is pruning recent push
timestamps:
https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_push.rs#L275-L279
instead of the older ones.

Co-authored-by: Nathan Hawkins <utsl@utsl.org>
2021-03-24 18:33:56 +00:00
behzad nouri
570fd3f810 makes turbine peer computation consistent between broadcast and retransmit (#14910)
get_broadcast_peers is using tvu_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370
which is potentially inconsistent with retransmit_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345

Also, the leader does not include its own contact-info when broadcasting
shreds:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324
but on the retransmit side, slot leader is removed only _after_ neighbors and
children are computed:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384
So the turbine broadcast tree is different between the two stages.

This commit:
* Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers
  consistently.
* Retransmit stage removes slot leader _before_ computing children and
  neighbors.
2021-03-24 13:34:48 +00:00
behzad nouri
f2865dfd63 requires stakes for propagating crds values through gossip (#15561) 2021-03-12 15:50:14 +00:00
behzad nouri
56923c91bf limits number of unique pubkeys in the crds table (#15539) 2021-03-10 20:46:05 +00:00
behzad nouri
5a9896706c indexes epoch slots in crds table (#15459)
ClusterInfo::get_epoch_slots_since scans the entire crds table to obtain
epoch-slots inserted since a timestamp:
https://github.com/solana-labs/solana/blob/013daa8f4/core/src/cluster_info.rs#L1245-L1262
The alternative is to index epoch-slots in crds table ordered by their
insert timestamp.
2021-02-26 14:12:04 +00:00
carllin
c2e8814dce Add limit and shrink policy for recycler (#15320) 2021-02-24 00:15:58 -08:00
Michael Vines
5df36aec7d Pacify clippy 2021-02-19 20:08:41 -08:00
behzad nouri
aa3aac766f adds metrics for inbound/outbound gossip packets counts (#15407) 2021-02-19 22:49:35 +00:00
behzad nouri
076c20f1ca checks that prune-messages have the same inner/outer pubkey (#15352) 2021-02-16 21:06:18 +00:00
behzad nouri
0ad063f4e9 adds flag to disable duplicate instance check (#15006) 2021-02-03 16:26:17 +00:00
dependabot[bot]
1df93fa2be chore: bump serde from 1.0.112 to 1.0.118 (#14828)
* chore: bump serde from 1.0.112 to 1.0.122

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.112 to 1.0.122.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.122)

Signed-off-by: dependabot[bot] <support@github.com>

* [auto-commit] Update all Cargo lock files

* Update frozen_abi digest following serde update

* Revert "chore: bump serde from 1.0.112 to 1.0.122"

This reverts commit a3ef4442a4.

* Revert "[auto-commit] Update all Cargo lock files"

This reverts commit c41c3b005f.

* chore: bump serde from 1.0.112 to 1.0.118

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.112 to 1.0.118.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.118)

Signed-off-by: dependabot[bot] <support@github.com>

* [auto-commit] Update all Cargo lock files

* Remove serum-dex pinning

* blind commit!

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <dependabot-buildkite@noreply.solana.com>
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2021-02-02 23:28:16 +09:00
behzad nouri
e1021d9f83 removes redundant epoch stakes cache in retransmit (#14781)
Following d6d76219b, staked nodes computed from vote accounts are
already cached in runtime::Stakes, so the caching in retransmit_stage is
redundant.
2021-01-24 21:15:09 +00:00
behzad nouri
491b059755 broadcasts duplicate shreds through gossip (#14699) 2021-01-24 15:47:43 +00:00
behzad nouri
8e581601d6 patches crds vote-index assignment bug (#14438)
If tower is full, old votes are evicted from the front of the deque:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373
whereas recent votes if expire are evicted from the back:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537

As a result, from a single tower_index scalar, we cannot infer which crds-vote
should be overwritten:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576

In addition there is an off by one bug in the existing code. tower_index is
bounded by MAX_LOCKOUT_HISTORY - 1:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382
So, it is at most 30, whereas MAX_VOTES is 32:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29
Which means that this branch is never taken:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593
so crds table alwasys keeps 29 **oldest** votes by wallclock, and then
only overrides the 30st one each time. (i.e a tally of only two most
recent votes).
2021-01-21 13:08:07 +00:00
behzad nouri
b5fd0ed859 rewrites turbine retransmit peers computation (#14584) 2021-01-19 04:18:47 +00:00
Michael Vines
9ddd6f08e8 Persist gossip contact info 2020-12-27 20:46:54 -08:00
behzad nouri
2fd38d9912 indexes votes in crds table (#14272) 2020-12-27 13:31:05 +00:00
behzad nouri
49019c6613 obtains staked-nodes from the root-bank (#14257)
... as opposed to the working bank
2020-12-27 13:28:05 +00:00
Michael Vines
ace360ade2 Multiple entrypoint support 2020-12-22 18:35:31 -08:00
Michael Vines
3373082ffa Update entrypoint contact info even when shred version adoption is not requested 2020-12-22 18:35:31 -08:00
behzad nouri
a14cfd660a removes &Arc<Self> receivers (#14234) 2020-12-22 23:51:53 +00:00
behzad nouri
691031fefd limits number of crds values returned when responding to pull requests (#13739)
Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
2020-12-18 18:45:12 +00:00
behzad nouri
6a3797e164 adds crds-value for broadcasting duplicate shreds through gossip (#14133)
In gossip, the header overhead we get from:
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/cluster_info.rs#L434-L435
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L31-L36
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L73
already exceeds SIZE_OF_NONCE in shreds. We also need aditional
meta-data (wallclock, source pubkey, ...). Which means that given the
SHRED_PAYLOAD_SIZE, we cannot fit all these in PACKET_DATA_SIZE:
https://github.com/solana-labs/solana/blob/de9ac43eb/ledger/src/shred.rs#L80

On top of that, we need 2 shred payloads as the proof of duplicate. So
each DuplicateShred crds value includes only a chunk of the payload,
along with the meta-data to reconstruct the full payload from the chunks
on the receiving end.
2020-12-18 14:32:43 +00:00
behzad nouri
d6d76219b6 caches staked nodes computed from vote-accounts (#13929) 2020-12-17 21:22:50 +00:00
Michael Vines
7143aaa89b Clippy 2020-12-14 08:03:29 -08:00
behzad nouri
409fe3bca1 adds the instance token to crds-labels for node-instance crds-values (#14037)
If a node "a" receives instance-info from node "b1" it will override any
instance-info associated with "b1" pubkey in its crds table. This makes
it less likely that when "b1" receives crds values from "a" (either
through pull or push), it sees other instances of itself (because node
"a" discarded them when it received "b1" instance info).

In order for the crds table to contain all instance-info associated with
the same pubkey at the same time, we need to add the instance tokens to
the keys in the crds table (i.e. the CrdsValueLabel).
2020-12-10 17:01:55 +00:00
behzad nouri
1d267eae6b std::process::exit to kill all threads 2020-12-09 10:24:23 -08:00
behzad nouri
895d7d6a65 removes RwLock on ClusterInfo.instance 2020-12-09 10:24:23 -08:00
behzad nouri
542198180a pushes node-instance along with version early in gossip 2020-12-09 10:24:23 -08:00
behzad nouri
8cd5eb9863 checks for duplicate validator instances using gossip 2020-12-09 10:24:23 -08:00
behzad nouri
6706f2b3bb removes recursive read-locks on gossip (#13973)
ClusterInfo::tvu_peers acquires a read-lock on gossip:
https://github.com/solana-labs/solana/blob/f0e934145/core/src/cluster_info.rs#L1171-L1185
and so, ClusterInfo::repair_peers is recursively locking gossip for
read twice:
https://github.com/solana-labs/solana/blob/f0e934145/core/src/cluster_info.rs#L1202-L1223
But std::sync::RwLock is not re-entrant (recursive).
2020-12-06 15:14:49 +00:00
behzad nouri
c3048b451d samples repair peers using WeightedIndex (#13919)
To output one random sample, weighted_best generates n random numbers:
https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/weighted_shuffle.rs#L38-L63
WeightedIndex does so with only one random number:
https://github.com/rust-random/rand/blob/eb02f0e46/src/distributions/weighted_index.rs#L223-L240
Additionally, if the index is already constructed, it only does a total
of O(log(n)) amount of work; which can be achieved if RepairCache,
caches the weighted index:
https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/serve_repair.rs#L83

Also, the repair-peers code can be reorganized to have fewer redundant
unlock-then-lock code.
2020-12-03 14:26:07 +00:00
Tyera Eulberg
10c81a2448 Remove rpc_banks from validator (#13882)
* Remove rpc_banks from validator

* Bump abi-digest
2020-12-02 03:25:09 +00:00
behzad nouri
26bf2b7e45 processes pull-request callers only once per unique caller (#13750)
process_pull_requests acquires a write lock on crds table to update
records timestamp for each of the pull-request callers:
https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300
However, pull-requests overlap a lot in callers and this function ends
up doing a lot of redundant duplicate work.

This commit obtains unique callers before acquiring an exclusive lock on
crds table.
2020-11-22 17:51:14 +00:00
sakridge
c1eb350c47 Allow contact debug interval to be adjusted (#13737) 2020-11-20 14:47:37 -08:00