solana

Author	SHA1	Message	Date
behzad nouri	1ac2a8cfa5	removes delayed crds inserts when upserting gossip table (#16806 ) It is crucial that VersionedCrdsValue::insert_timestamp does not go backward in time: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79 Otherwise methods such as get_votes and get_epoch_slots_since will break, which will break their downstream flow, including vote-listener and optimistic confirmation: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 For that, Crds::new_versioned is intended to be called "atomically" with Crds::insert_verioned (as the comment already says so): https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129 However, currently this is violated in the code. For example, filter_pull_responses creates VersionedCrdsValues (with the current timestamp), then acquires an exclusive lock on gossip, then process_pull_responses writes those values to the crds table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392 Depending on the workload and lock contention, the insert_timestamps may well be in the past when these values finally are inserted into gossip. To avoid such scenarios, this commit: * removes Crds::new_versioned and Crd::insert_versioned. * makes VersionedCrdsValue constructor private, only invoked in Crds::insert, so that insert_timestamp is populated right before insert. This will improve insert_timestamp monotonicity as long as Crds::insert is not called with a stalled timestamp. Following commits may further improve this by calling timestamp() inside Crds::insert, and/or switching to std::time::Instant which guarantees monotonicity.	2021-04-28 11:56:13 +00:00
behzad nouri	2c82f2154d	retains crds values if the origin is still active (#16576 ) Local timestamps are updated for records associated with a pubkey if the origin is still active: https://github.com/solana-labs/solana/blob/c8ed14c64/core/src/crds.rs#L301-L311 However this is done inconsistently on some gossip paths (pull requests and pull responses) but not all (e.g. push messages). Additionally update_record_timestamp is inefficient since there can be ~800 values associated with each pubkey. This commit updates records timestamps only on contact-infos; and, instead utilizes origin's timestamp when purging old values.	2021-04-23 15:14:49 +00:00
behzad nouri	bc90e04e64	uses current local timestamp when recording purged values CrdsGossipPull.purged_values is meant to record recently purged values so that they are excluded from imminent pull requests, until the entire cluster have synced to the updated value: https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L449-L454 However, VersionedCrdsValue.local_timestamp represents the local time when the value was last updated, and given that crds values may have different timeouts based on stake, it does not necessarily represent how recently the value was purged: https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds.rs#L75-L76 As such, recording current local timestamp when purging values is more appropriate. Additionally, purge_purged assumes that the purge_values is sorted in timestamps when draining the old ones; which is not true if those timestamps are VersionedCrdsValue.local_timestamp: https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L563-L571	2021-04-20 11:21:00 +00:00
François Garillot	b08cff9e77	Simplify some pattern-matches (#16402 ) When those match an exact combinator on Option / Result. Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).	2021-04-08 12:40:37 -06:00
behzad nouri	a6c23648cb	limits CrdsGossipPull::pull_request_time size (#15793 ) There is no pruning logic on CrdsGossipPull::pull_request_time https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_pull.rs#L172-L174 potentially allowing this to take too much memory. Additionally, CrdsGossipPush::last_pushed_to is pruning recent push timestamps: https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_push.rs#L275-L279 instead of the older ones. Co-authored-by: Nathan Hawkins <utsl@utsl.org>	2021-03-24 18:33:56 +00:00
behzad nouri	f2865dfd63	requires stakes for propagating crds values through gossip (#15561 )	2021-03-12 15:50:14 +00:00
behzad nouri	56923c91bf	limits number of unique pubkeys in the crds table (#15539 )	2021-03-10 20:46:05 +00:00
behzad nouri	5a9896706c	indexes epoch slots in crds table (#15459 ) ClusterInfo::get_epoch_slots_since scans the entire crds table to obtain epoch-slots inserted since a timestamp: https://github.com/solana-labs/solana/blob/013daa8f4/core/src/cluster_info.rs#L1245-L1262 The alternative is to index epoch-slots in crds table ordered by their insert timestamp.	2021-02-26 14:12:04 +00:00
behzad nouri	491b059755	broadcasts duplicate shreds through gossip (#14699 )	2021-01-24 15:47:43 +00:00
behzad nouri	766195dded	limits number of crds values associated with a pubkey (#14467 )	2021-01-08 18:54:40 +00:00
behzad nouri	2fd38d9912	indexes votes in crds table (#14272 )	2020-12-27 13:31:05 +00:00
behzad nouri	6a3797e164	adds crds-value for broadcasting duplicate shreds through gossip (#14133 ) In gossip, the header overhead we get from: https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/cluster_info.rs#L434-L435 https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L31-L36 https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L73 already exceeds SIZE_OF_NONCE in shreds. We also need aditional meta-data (wallclock, source pubkey, ...). Which means that given the SHRED_PAYLOAD_SIZE, we cannot fit all these in PACKET_DATA_SIZE: https://github.com/solana-labs/solana/blob/de9ac43eb/ledger/src/shred.rs#L80 On top of that, we need 2 shred payloads as the proof of duplicate. So each DuplicateShred crds value includes only a chunk of the payload, along with the meta-data to reconstruct the full payload from the chunks on the receiving end.	2020-12-18 14:32:43 +00:00
behzad nouri	c2b7115031	indexes crds values associated with a pubkey (#14088 ) record_labels returns all the possible labels for a record identified by a pubkey, used in updating timestamp of crds values: https://github.com/solana-labs/solana/blob/1792100e2/core/src/crds_value.rs#L560-L577 https://github.com/solana-labs/solana/blob/1792100e2/core/src/crds.rs#L240-L251 The code relies on CrdsValueLabel to be limited to a small deterministic set of possible values for a fixed pubkey. As we expand crds values to include duplicate shreds, this limits what the duplicate proofs can be keyed by in the table. In addition the computation of these labels is inefficient and will become more so as duplicate shreds and more types of crds values are added. An alternative is to maintain an index of all crds values associated with a pubkey.	2020-12-15 01:49:22 +00:00
behzad nouri	c3048b451d	samples repair peers using WeightedIndex (#13919 ) To output one random sample, weighted_best generates n random numbers: https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/weighted_shuffle.rs#L38-L63 WeightedIndex does so with only one random number: https://github.com/rust-random/rand/blob/eb02f0e46/src/distributions/weighted_index.rs#L223-L240 Additionally, if the index is already constructed, it only does a total of O(log(n)) amount of work; which can be achieved if RepairCache, caches the weighted index: https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/serve_repair.rs#L83 Also, the repair-peers code can be reorganized to have fewer redundant unlock-then-lock code.	2020-12-03 14:26:07 +00:00
behzad nouri	26bf2b7e45	processes pull-request callers only once per unique caller (#13750 ) process_pull_requests acquires a write lock on crds table to update records timestamp for each of the pull-request callers: https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300 However, pull-requests overlap a lot in callers and this function ends up doing a lot of redundant duplicate work. This commit obtains unique callers before acquiring an exclusive lock on crds table.	2020-11-22 17:51:14 +00:00
behzad nouri	b58f69297f	makes crds fields private (#13703 ) Crds fields should maintain several invariants between themselves, so exposing them as public fields can be bug prone. In addition these invariants are asserted on every write: https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154 https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262 which adds extra instructions and is not optimal. Should these fields be private the asserts will be redundant.	2020-11-19 20:57:40 +00:00
behzad nouri	cbea9ebc34	indexes nodes' contact infos in crds table (#13553 ) In several places in gossip code, the entire crds table is scanned only to filter out nodes' contact infos. Currently on mainnet, crds table is of size ~70k, while there are only ~470 nodes. So the full table scan is inefficient. Instead we may maintain an index of only nodes' contact infos.	2020-11-15 16:38:04 +00:00
behzad nouri	10fa4f45ab	uses thread-pool when handling push messages (#13338 ) From runtime profiles, the majority time of solana-listen thread: https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L2720 is spent handling push messages. The code here: https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L2272-L2364 may utilize the idle gossip thread-pool.	2020-11-04 19:15:58 +00:00
behzad nouri	37c8842bcb	scans crds table in parallel for finding old labels (#13073 ) From runtime profiles, the majority time of ClusterInfo::handle_purge https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626 is spent scanning crds table finding old labels: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197 This can be done in parallel given that gossip thread-pool: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641 is idle when handle_purge is invoked: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681	2020-10-23 14:17:37 +00:00
Michael Vines	959880db60	Remove unused pubkey::Pubkey imports	2020-10-21 19:08:13 -07:00
Michael Vines	7bc073defe	Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand`	2020-10-21 19:08:13 -07:00
behzad nouri	1866521df6	retains hash value of outdated responses received from pull requests (#12513 ) pull_response_fail_inserts has been increasing: https://cdn.discordapp.com/attachments/478692221441409024/759096187587657778/pull_response_fail_insert.png but for outdated values which fail to insert: https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L332-L344 https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds.rs#L104-L108 are not recorded anywhere, and so the next pull request may obtain the same redundant payload again, unnecessary taking bandwidth. This commit holds on to the hashes of failed-inserts for a while, similar to purged_values: https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L380 and filter them out for the next pull request: https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L204	2020-10-01 00:39:22 +00:00
behzad nouri	57ed4e4657	patches bug in Crds::find_old_labels with pubkey specific timeout (#12528 ) Current code only returns values which are expired based on the default timeout. Example from the added unit test: - value inserted at time 0 - pubkey specific timeout = 1 - default timeout = 3 Then at now = 2, the value is expired, but the function fails to return the value because it compares with the default timeout.	2020-09-29 09:04:40 +00:00
behzad nouri	9b866d79fb	shards crds values based on their hash prefix (#12187 ) filter_crds_values checks every crds filter against every hash value: https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432 which can be inefficient if the filter's bit-mask only matches small portion of the entire crds table. This commit shards crds values into separate tables based on shard_bits first bits of their hash prefix. Given a (mask, mask_bits) filter, filtering crds can be done by inspecting only relevant shards. If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds values which match (mask, mask_bits) bit pattern are traversed. If CrdsFilter.mask_bits > shard_bits, then approximately only 1/2^shard_bits of crds values are inspected. Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in generate_pull_responses metric, but with larger clusters, crds table and 2^mask_bits are both larger, so the impact should be more significant.	2020-09-17 14:05:16 +00:00
sakridge	f519fdecc2	generate_pull_response optimization (#11597 )	2020-08-12 22:45:19 -07:00
Greg Fitzgerald	0550b893b0	Fix typos (#10675 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2020-06-17 20:54:52 -07:00
anatoly yakovenko	ba83e4ca50	Fix fannout gossip bench (#10509 ) * Gossip benchmark * Rayon tweaking * push pulls * fanout to max nodes * fixup! fanout to max nodes * fixup! fixup! fanout to max nodes * update * multi vote test * fixup prune * fast propagation * fixups * compute up to 95% * test for specific tx * stats * stats * fixed tests * rename * track a lagging view of which nodes have the local node in their active set in the local received_cache * test fixups * dups are old now * dont prune your own origin * send vote to tpu * tests * fixed tests * fixed test * update * ignore scale * lint * fixup * fixup * fixup * cleanup Co-authored-by: Stephen Akridge <sakridge@gmail.com>	2020-06-13 22:03:38 -07:00
sakridge	ecb6959720	Optimize process pull responses (#10460 ) * Batch process pull responses * Generate pull requests at 1/2 rate * Do filtering work of process_pull_response in read lock Only take write lock to insert if needed.	2020-06-09 17:08:13 -07:00
Kristofer Peterson	58ef02f02b	9951 clippy errors in the test suite (#10030 ) automerge	2020-05-15 09:35:43 -07:00
anatoly yakovenko	b150da837a	Use epoch as the gossip purge timeout for staked nodes. (#7005 ) automerge	2019-11-20 11:25:18 -08:00
Sagar Dhawan	568475e2db	Fix incorrectly signed CrdsValues (#6696 )	2019-11-03 10:07:51 -08:00
TristanDebrunner	9e52d11ad0	Remove Backend trait (#6407 )	2019-10-17 15:19:27 -06:00
Greg Fitzgerald	fcef54d062	Add a constructor to generate random pubkeys	2019-03-31 16:23:18 -06:00
Rob Walker	195a880576	pass Pubkeys as refs, copy only where values needed (#3213 ) * pass Pubkeys as refs, copy only where values needed * Pubkey is pervasive * fixup	2019-03-09 19:28:43 -08:00
Michael Vines	79b2542ca4	Remove CrdsValue::LeaderId	2019-03-08 19:41:51 -08:00
Michael Vines	5f5d779ee1	Move src/ into core/src. Top-level crate is now called solana-workspace	2019-03-02 09:52:18 -08:00

36 Commits