* adds the instance token to crds-labels for node-instance crds-values (#14037)
If a node "a" receives instance-info from node "b1" it will override any
instance-info associated with "b1" pubkey in its crds table. This makes
it less likely that when "b1" receives crds values from "a" (either
through pull or push), it sees other instances of itself (because node
"a" discarded them when it received "b1" instance info).
In order for the crds table to contain all instance-info associated with
the same pubkey at the same time, we need to add the instance tokens to
the keys in the crds table (i.e. the CrdsValueLabel).
(cherry picked from commit 409fe3bca1)
# Conflicts:
# core/src/cluster_info.rs
* removes backport merge conflicts
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
* checks for duplicate validator instances using gossip
(cherry picked from commit 8cd5eb9863)
# Conflicts:
# core/src/cluster_info.rs
* pushes node-instance along with version early in gossip
(cherry picked from commit 542198180a)
* removes RwLock on ClusterInfo.instance
(cherry picked from commit 895d7d6a65)
# Conflicts:
# core/src/cluster_info.rs
* std::process::exit to kill all threads
(cherry picked from commit 1d267eae6b)
* removes backport merge conflicts
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
process_pull_requests acquires a write lock on crds table to update
records timestamp for each of the pull-request callers:
https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300
However, pull-requests overlap a lot in callers and this function ends
up doing a lot of redundant duplicate work.
This commit obtains unique callers before acquiring an exclusive lock on
crds table.
(cherry picked from commit 26bf2b7e45)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Validator logs show that prune messages are dropped because they exceed
packet data size:
https://github.com/solana-labs/solana/blob/f25c969ad/perf/src/packet.rs#L90-L92
This can exacerbate gossip traffic by redundantly increasing push
messages across network. The workaround is to break prunes into smaller
chunks and send over in multiple messages.
(cherry picked from commit 1ffab5de77)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
split_gossip_messages:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L1536-L1574
splits crds-values into chunks to fit into a gossip packet. However it is
using a global upper-bound for the header-size across all protocols:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L90-L93
This can be wasteful as the specific gossip protocol can have smaller
header than this upper-bound (e.g. Protocol::PushMessage is 170 bytes
smaller). Adding more crds-values in one gossip packet can avoid the
overheads of separate packets and reduce total number of bytes sent over
the wire.
This commit updates the splitting function to take a max-chunk-size
argument. At call-site, this value is set to the size of the protocol
which the values are sent over.
(cherry picked from commit 5e8490ab9d)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
(cherry picked from commit cbea9ebc34)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Packet::from_data is ignoring serialization errors:
https://github.com/solana-labs/solana/blob/d08c3232e/sdk/src/packet.rs#L42-L48
This is likely never useful as the packet will be sent over the wire
taking bandwidth but at the receiving end will either fail to
deserialize or it will be invalid.
This commit will propagate the errors out of the function to the
call-site, allowing the call-site to handle the error.
(cherry picked from commit 73ac104df2)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
https://hackerone.com/reports/991106
> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol
The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.
When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
(cherry picked from commit ae91270961)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
* Allow nodes to advertise a different rpc address over gossip
* Feedback
(cherry picked from commit 8b0242a5d8)
Co-authored-by: Justin Starry <justin@solana.com>
ClusterInfo::process_packets handles incoming packets in a thread_pool:
https://github.com/solana-labs/solana/blob/87311cce7/core/src/cluster_info.rs#L2118-L2134
However, profiling runtime shows that threads are not well utilized and
a lot of the processing is done sequentially.
This commit redistributes the work done in parallel. Testing on a gce
cluster shows 20%+ improvement in processing gossip packets with much
smaller variations.
(cherry picked from commit 75d62ca095)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
* passes through feature-set to down to gossip requests handling
* takes the feature-set from root_bank instead of working_bank
(cherry picked from commit 48283161c3)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
* Filter push/pulls from spies
* Don't pull from peers with shred version == 0, don't push to people with shred_version == 0
Co-authored-by: Carl <carl@solana.com>
* Gossip benchmark
* Rayon tweaking
* push pulls
* fanout to max nodes
* fixup! fanout to max nodes
* fixup! fixup! fanout to max nodes
* update
* multi vote test
* fixup prune
* fast propagation
* fixups
* compute up to 95%
* test for specific tx
* stats
* stats
* fixed tests
* rename
* track a lagging view of which nodes have the local node in their active set in the local received_cache
* test fixups
* dups are old now
* dont prune your own origin
* send vote to tpu
* tests
* fixed tests
* fixed test
* update
* ignore scale
* lint
* fixup
* fixup
* fixup
* cleanup
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
* Batch process pull responses
* Generate pull requests at 1/2 rate
* Do filtering work of process_pull_response in read lock
Only take write lock to insert if needed.