Commit Graph

305 Commits

Author SHA1 Message Date
behzad nouri
c3048b451d samples repair peers using WeightedIndex (#13919)
To output one random sample, weighted_best generates n random numbers:
https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/weighted_shuffle.rs#L38-L63
WeightedIndex does so with only one random number:
https://github.com/rust-random/rand/blob/eb02f0e46/src/distributions/weighted_index.rs#L223-L240
Additionally, if the index is already constructed, it only does a total
of O(log(n)) amount of work; which can be achieved if RepairCache,
caches the weighted index:
https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/serve_repair.rs#L83

Also, the repair-peers code can be reorganized to have fewer redundant
unlock-then-lock code.
2020-12-03 14:26:07 +00:00
Tyera Eulberg
10c81a2448 Remove rpc_banks from validator (#13882)
* Remove rpc_banks from validator

* Bump abi-digest
2020-12-02 03:25:09 +00:00
behzad nouri
26bf2b7e45 processes pull-request callers only once per unique caller (#13750)
process_pull_requests acquires a write lock on crds table to update
records timestamp for each of the pull-request callers:
https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300
However, pull-requests overlap a lot in callers and this function ends
up doing a lot of redundant duplicate work.

This commit obtains unique callers before acquiring an exclusive lock on
crds table.
2020-11-22 17:51:14 +00:00
sakridge
c1eb350c47 Allow contact debug interval to be adjusted (#13737) 2020-11-20 14:47:37 -08:00
behzad nouri
b58f69297f makes crds fields private (#13703)
Crds fields should maintain several invariants between themselves, so
exposing them as public fields can be bug prone. In addition these
invariants are asserted on every write:
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262
which adds extra instructions and is not optimal. Should these fields be
private the asserts will be redundant.
2020-11-19 20:57:40 +00:00
behzad nouri
1ffab5de77 breaks prunes data into chunks to fit into packets (#13613)
Validator logs show that prune messages are dropped because they exceed
packet data size:
https://github.com/solana-labs/solana/blob/f25c969ad/perf/src/packet.rs#L90-L92
This can exacerbate gossip traffic by redundantly increasing push
messages across network. The workaround is to break prunes into smaller
chunks and send over in multiple messages.
2020-11-19 16:38:01 +00:00
behzad nouri
5e8490ab9d packs more crds-values in a single gossip packet (#13500)
split_gossip_messages:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L1536-L1574
splits crds-values into chunks to fit into a gossip packet. However it is
using a global upper-bound for the header-size across all protocols:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L90-L93
This can be wasteful as the specific gossip protocol can have smaller
header than this upper-bound (e.g. Protocol::PushMessage is 170 bytes
smaller). Adding more crds-values in one gossip packet can avoid the
overheads of separate packets and reduce total number of bytes sent over
the wire.

This commit updates the splitting function to take a max-chunk-size
argument. At call-site, this value is set to the size of the protocol
which the values are sent over.
2020-11-15 18:23:59 +00:00
behzad nouri
cbea9ebc34 indexes nodes' contact infos in crds table (#13553)
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
2020-11-15 16:38:04 +00:00
behzad nouri
73ac104df2 propagates errors out of Packet::from_data (#13445)
Packet::from_data is ignoring serialization errors:
https://github.com/solana-labs/solana/blob/d08c3232e/sdk/src/packet.rs#L42-L48
This is likely never useful as the packet will be sent over the wire
taking bandwidth but at the receiving end will either fail to
deserialize or it will be invalid.
This commit will propagate the errors out of the function to the
call-site, allowing the call-site to handle the error.
2020-11-08 15:10:03 +00:00
behzad nouri
7f4debdad5 drops older gossip packets when load shedding (#13364)
Gossip drops incoming packets when overloaded:
https://github.com/solana-labs/solana/blob/f6a73098a/core/src/cluster_info.rs#L2462-L2475
However newer packets are dropped in favor of the older ones.
This is probably not ideal as newer packets are more likely to contain
more recent data, so dropping them will keep the validator state
lagging.
2020-11-05 17:14:28 +00:00
behzad nouri
8f0796436a shares the lock on gossip when processing prune messages (#13339)
Processing prune messages acquires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L1824-L1825
This can be reduced to a shared lock if active-sets are changed to use
atomic bloom filters:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/crds_gossip_push.rs#L50
2020-11-05 15:42:00 +00:00
behzad nouri
118ce47b97 measures processing time of each kind of gossip packets (#13366) 2020-11-05 15:34:34 +00:00
behzad nouri
10fa4f45ab uses thread-pool when handling push messages (#13338)
From runtime profiles, the majority time of solana-listen thread:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L2720
is spent handling push messages. The code here:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L2272-L2364
may utilize the idle gossip thread-pool.
2020-11-04 19:15:58 +00:00
Michael Vines
df8dab9d2b Native/builtin programs now receive an InvokeContext 2020-10-29 21:45:24 -07:00
behzad nouri
3738611f5c adds more parallel processing to gossip packets handling (#12988) 2020-10-29 15:17:19 +00:00
behzad nouri
ae91270961 implements ping-pong packets between nodes (#12794)
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
2020-10-28 17:03:02 +00:00
behzad nouri
4bfda3e766 marks pull request creation time only once per peer (#13113)
mark_pull_request_creation time requires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/16944e218/core/src/cluster_info.rs#L1547-L1548
Current code is redundantly marking each peer once for each request.
There are at most only 2 unique peers, whereas there are hundreds of
requests per each. So the lock is acquired hundreds of time longer than
necessary.
2020-10-26 17:11:31 +00:00
Michael Vines
a4956844bd Update frozen_abi hashes
The movement of files in sdk/ caused ABI hashes to change
2020-10-24 08:37:55 -07:00
behzad nouri
37c8842bcb scans crds table in parallel for finding old labels (#13073)
From runtime profiles, the majority time of ClusterInfo::handle_purge
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626
is spent scanning crds table finding old labels:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197

This can be done in parallel given that gossip thread-pool:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641
is idle when handle_purge is invoked:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681
2020-10-23 14:17:37 +00:00
Justin Starry
c95f6c4b83 Remove spammy invalid rpc log (#13100) 2020-10-23 07:05:29 +00:00
Justin Starry
8b0242a5d8 Allow nodes to advertise a different rpc address over gossip (#13053)
* Allow nodes to advertise a different rpc address over gossip

* Feedback
2020-10-22 03:31:48 +00:00
Michael Vines
959880db60 Remove unused pubkey::Pubkey imports 2020-10-21 19:08:13 -07:00
Michael Vines
7bc073defe Run codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand 2020-10-21 19:08:13 -07:00
behzad nouri
75d62ca095 improves threads' utilization in processing gossip packets (#12962)
ClusterInfo::process_packets handles incoming packets in a thread_pool:
https://github.com/solana-labs/solana/blob/87311cce7/core/src/cluster_info.rs#L2118-L2134

However, profiling runtime shows that threads are not well utilized and
a lot of the processing is done sequentially.

This commit redistributes the work done in parallel. Testing on a gce
cluster shows 20%+ improvement in processing gossip packets with much
smaller variations.
2020-10-19 19:03:38 +00:00
behzad nouri
48283161c3 passes through feature-set to gossip requests handling (#12878)
* passes through feature-set to down to gossip requests handling
* takes the feature-set from root_bank instead of working_bank
2020-10-15 20:54:21 +00:00
behzad nouri
05cf15a382 implements DataBudget using atomics (#12856) 2020-10-15 11:33:58 +00:00
sakridge
1f1eb9f26e Add separate push queue to reduce push lock contention (#12713) 2020-10-13 18:10:25 -07:00
behzad nouri
1866521df6 retains hash value of outdated responses received from pull requests (#12513)
pull_response_fail_inserts has been increasing:
https://cdn.discordapp.com/attachments/478692221441409024/759096187587657778/pull_response_fail_insert.png
but for outdated values which fail to insert:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L332-L344
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds.rs#L104-L108
are not recorded anywhere, and so the next pull request may obtain the
same redundant payload again, unnecessary taking bandwidth.

This commit holds on to the hashes of failed-inserts for a while, similar
to purged_values:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L380
and filter them out for the next pull request:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L204
2020-10-01 00:39:22 +00:00
behzad nouri
537bbde22e builds crds filters in parallel (#12360)
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
2020-09-29 23:06:02 +00:00
behzad nouri
0d5258b6d3 separates out ClusterInfo::{gossip,listen} thread-pools (#12535)
https://github.com/solana-labs/solana/pull/12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing https://github.com/solana-labs/solana/pull/12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts https://github.com/solana-labs/solana/pull/12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
2020-09-29 09:05:31 +00:00
Michael Vines
35f5f9fc7b Add feature set identifier to gossiped version information 2020-09-25 11:40:36 -07:00
behzad nouri
42f1ef8acb moves gossip-work thread pool cons to ClusterInfo::new (#12402) 2020-09-24 18:36:31 +00:00
Michael Vines
daae638781 Add --gossip-validator argument 2020-09-14 20:18:27 -07:00
carllin
0f0a2ddafe Filter push/pulls from spies (#11620)
* Filter push/pulls from spies

* Don't pull from peers with shred version == 0, don't push to people with shred_version == 0

Co-authored-by: Carl <carl@solana.com>
2020-08-18 18:52:45 -07:00
Michael Vines
d15173ad9d Address latest nightly clippy lints, but globally disable stable_sort_primitive 2020-08-17 22:36:10 -07:00
sakridge
54137e3446 Add incoming pull response counter (#11591) 2020-08-12 14:07:05 -07:00
carllin
1b238dd63e Gossip log (#11555)
Co-authored-by: Carl <carl@solana.com>
2020-08-11 21:03:54 +00:00
anatoly yakovenko
713851b68d filter out old gossip pull requests (#11448)
* init

* builds

* stats

* revert

* tests

* clippy

* add some jitter

* shorter jitter timer

* update

* fixup! update

* use saturating_sub

* fix filters
2020-08-11 06:26:42 -07:00
Greg Fitzgerald
bad486823c Add a client for BankForks (#10728)
Also:
* Use BanksClient in solana-tokens
2020-08-07 08:45:17 -06:00
Michael Vines
eefcf484cb clippy 2020-08-03 18:35:15 +00:00
Ryo Onodera
39b3ac6a8d Introduce automatic ABI maintenance mechanism (2/2; rollout) (#8012)
* Introduce automatic ABI maintenance mechanism (2/2; rollout)

* Fix stable clippy

* Change to symlink

* Freeze abi of Tower

* fmt...

* Improve dev-experience!

* Update BankSlotDelta

$ diff -u /tmp/abi8/*7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj* /tmp/abi8/*9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w*
--- /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj      2020-06-18 18:01:22.831228087 +0900
+++ /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w      2020-07-03 15:59:58.430695244 +0900
@@ -140,7 +140,7 @@
                                                         field u8
                                                             primitive u8
                                                         field solana_sdk::instruction::InstructionError
-                                                            enum InstructionError (variants = 34)
+                                                            enum InstructionError (variants = 35)
                                                                 variant(0) GenericError (unit)
                                                                 variant(1) InvalidArgument (unit)
                                                                 variant(2) InvalidInstructionData (unit)
@@ -176,6 +176,7 @@
                                                                 variant(31) CallDepth (unit)
                                                                 variant(32) MissingAccount (unit)
                                                                 variant(33) ReentrancyNotAllowed (unit)
+                                                                variant(34) MaxSeedLengthExceeded (unit)
                                                     variant(9) CallChainTooDeep (unit)
                                                     variant(10) MissingSignatureForFee (unit)
                                                     variant(11) InvalidAccountIndex (unit)

* Fix some merge conflicts...
2020-07-06 20:22:23 +09:00
sakridge
d9b389f510 Reduce logging lines (#10835) 2020-06-29 15:57:28 -07:00
Greg Fitzgerald
6ee222363e Move BankForks to solana_runtime (#10637)
* Move BankForks to solana_runtime

* Update imports
2020-06-17 15:27:03 +00:00
sakridge
1eca9b19ab Entry verify cleanup and gossip counters (#10632)
* Add prune message counter

* Switch to us verification time to match other counters

* Add separate transaction/poh verify timing
2020-06-16 14:00:29 -07:00
anatoly yakovenko
ba83e4ca50 Fix fannout gossip bench (#10509)
* Gossip benchmark

* Rayon tweaking

* push pulls

* fanout to max nodes

* fixup! fanout to max nodes

* fixup! fixup! fanout to max nodes

* update

* multi vote test

* fixup prune

* fast propagation

* fixups

* compute up to 95%

* test for specific tx

* stats

* stats

* fixed tests

* rename

* track a lagging view of which nodes have the local node in their active set in the local received_cache

* test fixups

* dups are old now

* dont prune your own origin

* send vote to tpu

* tests

* fixed tests

* fixed test

* update

* ignore scale

* lint

* fixup

* fixup

* fixup

* cleanup

Co-authored-by: Stephen Akridge <sakridge@gmail.com>
2020-06-13 22:03:38 -07:00
sakridge
4c140acb3b ClusterInfo cleanup (#10504)
automerge
2020-06-10 17:00:17 -07:00
sakridge
6eb5ef6ac7 Add back missing pull_response success counter (#10491) 2020-06-10 09:17:57 -07:00
sakridge
ecb6959720 Optimize process pull responses (#10460)
* Batch process pull responses

* Generate pull requests at 1/2 rate

* Do filtering work of process_pull_response in read lock

Only take write lock to insert if needed.
2020-06-09 17:08:13 -07:00
anatoly yakovenko
832d324a23 Revert "Gossip PullRequests tend to return a lot of duplicates. (#10326)" (#10455)
This reverts commit 31e20eff82.
2020-06-09 07:27:00 -07:00
Kristofer Peterson
e23340d89e Clippy cleanup for all targets and nighly rust (also support 1.44.0) (#10445)
* address warnings from 'rustup run beta cargo clippy --workspace'

minor refactoring in:
- cli/src/cli.rs
- cli/src/offline/blockhash_query.rs
- logger/src/lib.rs
- runtime/src/accounts_db.rs

expect some performance improvement AccountsDB::clean_accounts()

* address warnings from 'rustup run beta cargo clippy --workspace --tests'

* address warnings from 'rustup run nightly cargo clippy --workspace --all-targets'

* rustfmt

* fix warning stragglers

* properly fix clippy warnings test_vote_subscribe()
replace ref-to-arc with ref parameters where arc not cloned

* Remove lock around JsonRpcRequestProcessor (#10417)

automerge

* make ancestors parameter optional to avoid forcing construction of empty hash maps

Co-authored-by: Greg Fitzgerald <greg@solana.com>
2020-06-09 09:38:14 +09:00