Commit Graph

30 Commits

Author SHA1 Message Date
mergify[bot]
1be989c5d2 adds fallback logic if retransmit multicast fails (backport #17714) (#18498)
* adds fallback logic if retransmit multicast fails (#17714)

In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733

(cherry picked from commit be957f25c9)

# Conflicts:
#	core/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-07 22:36:32 +00:00
mergify[bot]
b08c0caefe adds metric for turbine retransmit tree mismatch (backport #17351) (#17392)
* adds metric for turbine retransmit tree mismatch

In order to remove port-based forwarding logic in turbine, we need to
first track how often the turbine retransmit/broadcast trees mismatch
across nodes.
One consistency condition is that if the node is on the critical path
(i.e. the first node in each neighborhood), then we expect that the
packet arrives at tvu socket as opposed to tvu-forwards.

This commit adds a metric to track how often above condition is not met.

(cherry picked from commit 71de021177)

* removes the nested for loop from retransmit-stage

The code can be simplified by just flattening the vector of packets.

(cherry picked from commit ff0e623d30)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-21 20:08:12 +00:00
mergify[bot]
330f42c375 implements cursor for gossip crds table queries (#16952) (#17084)
VersionedCrdsValue.insert_timestamp is used for fetching crds values
inserted since last query:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298

So it is crucial that insert_timestamp does not go backward in time when
new values are inserted into the table. However std::time::SystemTime is
not monotonic, or due to workload, lock contention, thread scheduling,
etc, ... new values may be inserted with a stalled timestamp way in the
past. Additionally, reading system time for the above purpose is
inefficient/unnecessary.

This commit adds an ordinal index to crds values indicating their insert
order. Additionally, it implements a new Cursor type for fetching values
inserted since last query.

(cherry picked from commit fa86a335b0)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-06 15:23:01 +00:00
mergify[bot]
7475a6f444 makes turbine peer computation consistent between broadcast and retransmit (#14910) (#16143)
get_broadcast_peers is using tvu_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370
which is potentially inconsistent with retransmit_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345

Also, the leader does not include its own contact-info when broadcasting
shreds:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324
but on the retransmit side, slot leader is removed only _after_ neighbors and
children are computed:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384
So the turbine broadcast tree is different between the two stages.

This commit:
* Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers
  consistently.
* Retransmit stage removes slot leader _before_ computing children and
  neighbors.

(cherry picked from commit 570fd3f810)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-03-26 00:16:48 +00:00
Trent Nelson
7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00
behzad nouri
0ad063f4e9 adds flag to disable duplicate instance check (#15006) 2021-02-03 16:26:17 +00:00
behzad nouri
8e581601d6 patches crds vote-index assignment bug (#14438)
If tower is full, old votes are evicted from the front of the deque:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373
whereas recent votes if expire are evicted from the back:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537

As a result, from a single tower_index scalar, we cannot infer which crds-vote
should be overwritten:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576

In addition there is an off by one bug in the existing code. tower_index is
bounded by MAX_LOCKOUT_HISTORY - 1:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382
So, it is at most 30, whereas MAX_VOTES is 32:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29
Which means that this branch is never taken:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593
so crds table alwasys keeps 29 **oldest** votes by wallclock, and then
only overrides the 30st one each time. (i.e a tally of only two most
recent votes).
2021-01-21 13:08:07 +00:00
Michael Vines
daae638781 Add --gossip-validator argument 2020-09-14 20:18:27 -07:00
carllin
b7ed06b17a Cleanup test utilities (#11723)
* Add voting utility

* Add blockstore utility

Co-authored-by: Carl <carl@solana.com>
2020-08-20 05:04:38 +00:00
carllin
6578ad7d08 Speed up local cluster partitioning tests (#11177)
* Fix long local cluster partition tests by skipping slot warmup

Co-authored-by: Carl <carl@solana.com>
2020-07-23 18:50:42 -07:00
Greg Fitzgerald
6ee222363e Move BankForks to solana_runtime (#10637)
* Move BankForks to solana_runtime

* Update imports
2020-06-17 15:27:03 +00:00
Ryo Onodera
a0692c9b4c Further fix the ci (#10559) 2020-06-14 14:53:49 +09:00
anatoly yakovenko
ba83e4ca50 Fix fannout gossip bench (#10509)
* Gossip benchmark

* Rayon tweaking

* push pulls

* fanout to max nodes

* fixup! fanout to max nodes

* fixup! fixup! fanout to max nodes

* update

* multi vote test

* fixup prune

* fast propagation

* fixups

* compute up to 95%

* test for specific tx

* stats

* stats

* fixed tests

* rename

* track a lagging view of which nodes have the local node in their active set in the local received_cache

* test fixups

* dups are old now

* dont prune your own origin

* send vote to tpu

* tests

* fixed tests

* fixed test

* update

* ignore scale

* lint

* fixup

* fixup

* fixup

* cleanup

Co-authored-by: Stephen Akridge <sakridge@gmail.com>
2020-06-13 22:03:38 -07:00
Kristofer Peterson
e23340d89e Clippy cleanup for all targets and nighly rust (also support 1.44.0) (#10445)
* address warnings from 'rustup run beta cargo clippy --workspace'

minor refactoring in:
- cli/src/cli.rs
- cli/src/offline/blockhash_query.rs
- logger/src/lib.rs
- runtime/src/accounts_db.rs

expect some performance improvement AccountsDB::clean_accounts()

* address warnings from 'rustup run beta cargo clippy --workspace --tests'

* address warnings from 'rustup run nightly cargo clippy --workspace --all-targets'

* rustfmt

* fix warning stragglers

* properly fix clippy warnings test_vote_subscribe()
replace ref-to-arc with ref parameters where arc not cloned

* Remove lock around JsonRpcRequestProcessor (#10417)

automerge

* make ancestors parameter optional to avoid forcing construction of empty hash maps

Co-authored-by: Greg Fitzgerald <greg@solana.com>
2020-06-09 09:38:14 +09:00
carllin
bab3502260 Push down cluster_info lock (#9594)
* Push down cluster_info lock

* Rework budget decrement

Co-authored-by: Carl <carl@solana.com>
2020-04-21 12:54:45 -07:00
anatoly yakovenko
9cedeb0a8d Pull streamer out into its own module. (#8917)
automerge
2020-03-17 23:30:23 -07:00
Tyera Eulberg
ab361a8073 Rename KeypairUtil to Signer (#8360)
automerge
2020-02-20 13:28:55 -08:00
carllin
d3712dd26d Factor repair from gossip (#8044) 2020-02-11 13:11:48 -07:00
carllin
fe590da3b6 Revert "Factor repair from gossip (#8044)" (#8143)
This reverts commit e61257695f.
2020-02-06 11:44:20 -08:00
carllin
e61257695f Factor repair from gossip (#8044) 2020-01-31 14:23:50 -08:00
Greg Fitzgerald
a707c9410e More thiserror (#7183)
* Less solana_core::result. Module now private.

* Drop solana_core::result dependency from a few more modules

* Fix warning

* Cleanup

* Fix typo
2020-01-02 20:50:43 -07:00
Greg Fitzgerald
a3a830e1ab Delete Service trait (#6921) 2019-11-13 11:12:09 -07:00
Pankaj Garg
753bd77b41 Use multicast to send retransmit packets (#6319) 2019-10-10 15:02:36 -07:00
Sagar Dhawan
723f9a9b81 Remove unnecessary locking in retransmit stage (#6276)
* Add more detailed metrics to retransmit

* Remove unnecessary locking and add more metrics
2019-10-08 14:41:16 -07:00
Sagar Dhawan
23ea8ae56b Optimize retransmit stage (#6231)
* Optimize retransmit stage

* Remove comment

* Fix test

* Skip iteration to fixup 0 stakes
2019-10-04 11:52:02 -07:00
Michael Vines
3450b9a44d Rename solana to solana-core (#5583) 2019-08-21 10:23:33 -07:00
Pankaj Garg
4798e7fa73 Integrate data shreds (#5541)
* Insert data shreds in blocktree and database

* Integrate data shreds with rest of the code base

* address review comments, and some clippy fixes

* Fixes to some tests

* more test fixes

* ignore some local cluster tests

* ignore replicator local cluster tests
2019-08-20 17:16:06 -07:00
Sagar Dhawan
43f7cd8149 Fix Retransmit slamming the leader with its own blobs (#3938) 2019-04-22 18:41:01 -07:00
Sagar Dhawan
349e8a9462 Ensure forwarded Blobs don't break Erasure (#3907) 2019-04-20 16:44:06 -07:00
Rob Walker
c70412d7bb move core tests to core (#3355)
* move core tests to core

* remove window

* fix up flaky tests

* test_entryfication needs a singly-threaded banking_stage

* move core benches to core

* remove unnecessary dependencies

* remove core as a member for now, test it like runtime

* stop running tests twice

* remove duplicate runs of tests in perf
2019-03-18 22:08:21 -07:00