solana

Author	SHA1	Message	Date
behzad nouri	ff0e623d30	removes the nested for loop from retransmit-stage The code can be simplified by just flattening the vector of packets.	2021-05-21 17:10:56 +00:00
behzad nouri	71de021177	adds metric for turbine retransmit tree mismatch In order to remove port-based forwarding logic in turbine, we need to first track how often the turbine retransmit/broadcast trees mismatch across nodes. One consistency condition is that if the node is on the critical path (i.e. the first node in each neighborhood), then we expect that the packet arrives at tvu socket as opposed to tvu-forwards. This commit adds a metric to track how often above condition is not met.	2021-05-21 17:10:56 +00:00
behzad nouri	2adce67260	extends crds values timeouts if stakes are unknown (#17261 ) If stakes are unknown, then timeouts will be short, resulting in values being purged from the crds table, and consequently higher pull-response load when they are obtained again from gossip. In particular, this slows down validator start where almost all values obtained from entrypoint are immediately discarded.	2021-05-21 15:55:22 +00:00
behzad nouri	5e6b00fe98	prioritizes more recent values in pull responses (#17238 ) On the receiving end, the outdated values are discarded, and they will only waste bandwidth: https://github.com/solana-labs/solana/blob/3f0480d06/core/src/crds_gossip_pull.rs#L385-L400 This is also exacerbating validator start, since the entrypoint is returning old values in pull responses, and the validator immediately discards those; resulting in huge delay until the validator obtains contact-info of the entrypoint and is able to adopt shred-version and fully start.	2021-05-21 14:07:46 +00:00
behzad nouri	e8b35a4f7b	bumps up min number of bloom items in gossip pull requests (#17236 ) When a validator starts, it has an (almost) empty crds table and it only sends one pull-request to the entrypoint. The bloom filter in the pull-request targets 10% false rate given the number of items. So, if the `num_items` is very wrong, it makes a very small bloom filter with a very high false rate: https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80 https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48 As a result, it is very unlikely that the validator obtains entrypoint's contact-info in response. This exacerbates how long the validator will loop on: > Waiting to adopt entrypoint shred version https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412 This commit increases the min number of bloom items when making gossip pull requests. Effectively this will break the entrypoint crds table into 64 shards, one pull-request for each, a larger bloom filter for each shard, and increases the chances that the response will include entrypoint's contact-info, which is needed for adopting shred version and validator start.	2021-05-21 13:59:26 +00:00
behzad nouri	13b032b2d4	removes manual trait impl for contact-info (#17332 ) The current implementations use only the id and disregard other fields, in particular wallclock. This can lead to bugs where an outdated contact-info shadows or overrides a current one because they compare equal.	2021-05-19 20:56:10 +00:00
behzad nouri	e7073ecab1	adds gossip metrics for number of staked nodes (#17330 )	2021-05-19 19:25:21 +00:00
Tao Zhu	0781fe1b4f	Upgrade Rust to 1.52.0 (#17096 ) * Upgrade Rust to 1.52.0 update nightly_version to newly pushed docker image fix clippy lint errors 1.52 comes with grcov 0.8.0, include this version to script * upgrade to Rust 1.52.1 * disabling Serum from downstream projects until it is upgraded to Rust 1.52.1	2021-05-19 09:31:47 -05:00
Tyera Eulberg	827355a6b1	Create solana-rpc crate and move subscriptions (#17320 ) * Move non_circulating_supply to runtime * Add solana-rpc crate and move max_slots * Move subscriptions to solana-rpc * Single use statements	2021-05-19 00:54:28 -06:00
behzad nouri	f7b0184f81	patches flaky test_new_mark_creation_time (#17288 )	2021-05-18 13:39:35 +00:00
Trent Nelson	67e6a3106f	rpc: plumb shred_version through RpcContactInfo	2021-05-14 08:36:08 +00:00
Tyera Eulberg	27004f1b76	Return error for excluded secondary-index keys (#17193 ) * Add runtime helpers to check secondary indexes for key * Add custom rpc error * Check secondary-index key inclusion in rpc * Clone complete AccountSecondaryIndexes into rpc to avoid bank query	2021-05-13 21:04:21 +00:00
behzad nouri	0e646d10bb	prunes received-cache only once per unique owner's key (#17039 )	2021-05-13 13:50:16 +00:00
behzad nouri	0aa7824884	retains one node-instance per pubkey (#17187 ) crds table retains up to 32 node-instance values per each pubkey. This is so because if there are multiple running instances of the same node, then we want gossip to propagate node-instance values associated with both instances, therefore the corresponding label/key includes the randomly generated token in addition to the pubkey: https://github.com/solana-labs/solana/blob/9c42a89a4/core/src/crds_value.rs#L448 https://github.com/solana-labs/solana/pull/14037 As a result, the number of such values per pubkey are effectively unbounded, requiring custom mitigations implemented in: https://github.com/solana-labs/solana/pull/14467 but still taking redundant extra memory and bandwidth. This commit instead retains only one node-instance per pubkey by extending crds values override logic. If a crds value is of type node-instance, it will always override an existing one with the same key if it has more recent starting timestamp (not wallclock). As a result, gossip will always propagate the node-instance with more recent timestamp. Since the check_duplicate logic will stop the node with older timestamp, this change should preserve existing functionality.	2021-05-13 13:35:46 +00:00
Lijun Wang	9c42a89a43	Issue #17008 -- make snapshot archives to hold on to configurable. (#17158 ) * purge_old_snapshot_archives is changed to take an extra argument 'maximum_snapshots_to_retain' to control the max number of latest snapshot archives to retain. Note the oldest snapshot is always retained as before and is not subjected to this new options. * The validator and ledger-tool executables are modified with a CLI argument --maximum-snapshots-to-retain. And the options are propagated down the call chains. Their corresponding shell scripts were changed accordingly. * SnapshotConfig is modified to have an extra field for the maximum_snapshots_to_retain * Unit tests are developed to cover purge_old_snapshot_archives	2021-05-12 10:32:27 -07:00
Tyera Eulberg	6e9deaf1bd	Move block-time caching earlier (#17109 ) * Require that blockstore block-time only be recognized slot, instead of root * Move cache_block_time to after Bank freeze * Single use statement * Pass transaction_status_sender by reference * Remove unnecessary slot-existence check before caching block time altogether * Move block-time existence check into Blockstore::cache_block_time, Blockstore no longer needed in blockstore_processor helper	2021-05-10 13:14:56 -06:00
Jeff Washington (jwash)	f39dda00e0	type AccountSecondaryIndexes = HashSet (#17108 )	2021-05-10 14:22:48 +00:00
behzad nouri	81ad795d46	removes position field in coding-shred-header CodingShredHeader.position is equal to ShredCommonHeader.index - ShredCommonHeader.fec_set_index and is so redundant. The extra position field can add bugs if not consistent with index and fec_set_index.	2021-05-10 13:20:56 +00:00
behzad nouri	22c02b917e	reads gossip push messages off crds ordinal index Having an ordinal index on crds values based on insert order allows to efficiently filter values using a cursor. In particular CrdsGossipPush::push_messages hash-map can be replaced with a cursor, saving on the bookkeepings, purging, etc	2021-05-09 22:40:41 +00:00
behzad nouri	dfa3e7a61c	indexes crds values by their insert order	2021-05-09 22:40:41 +00:00
Michael Vines	d6c076f1b6	getBlockProduction now correctly reports block production	2021-05-07 19:04:51 -07:00
behzad nouri	fa86a335b0	implements cursor for gossip crds table queries (#16952 ) VersionedCrdsValue.insert_timestamp is used for fetching crds values inserted since last query: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 So it is crucial that insert_timestamp does not go backward in time when new values are inserted into the table. However std::time::SystemTime is not monotonic, or due to workload, lock contention, thread scheduling, etc, ... new values may be inserted with a stalled timestamp way in the past. Additionally, reading system time for the above purpose is inefficient/unnecessary. This commit adds an ordinal index to crds values indicating their insert order. Additionally, it implements a new Cursor type for fetching values inserted since last query.	2021-05-06 14:04:17 +00:00
Michael Vines	9ba2c53b85	Add --tower argument to specify where tower files are persisted	2021-05-05 12:20:39 -07:00
Trent Nelson	f17b80236f	test-validator: Plumb --limit-ledger-size	2021-05-04 08:45:24 +00:00
carllin	bc7e741514	Integrate gossip votes into switching threshold (#16973 )	2021-05-04 00:51:42 -07:00
publish-docs.sh	6318705607	Add keys	2021-05-03 17:18:54 -07:00
publish-docs.sh	b948a18841	Key rotation	2021-05-03 17:18:54 -07:00
publish-docs.sh	b2778f34f5	Rotate keys	2021-05-03 17:18:54 -07:00
behzad nouri	7cea2c4466	validates gossip addresses before sending pull-requests IP addresses need to be validated before sending packets to them. This commit, sends a ping packet to nodes before any pull requests. Pull requests are then only sent to the nodes which have responded with the correct hash of their respective ping packet.	2021-05-03 18:21:06 +00:00
behzad nouri	2231017b35	uses Mutex instead of RwLock for ping_cache	2021-05-03 18:21:06 +00:00
behzad nouri	a698e34744	patches local pending push messages processing (#16833 ) process_push_messages writes local pending push messages to the crds table, but it discards the return value: https://github.com/solana-labs/solana/blob/cf779c63c/core/src/crds_gossip.rs#L96-L102 In order to exclude outdated values from the next pull-request, we need to record the hash of values purged/overridden by the local push messages, otherwise pull-responses will return outdated values back to the node: https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L447-L452 Additionally, gossip packets arrive and are processed out of order. So, local pending push messages should be flushed before generating bloom filters for pull-requests, preventing pull-responses returning the same values back to the node itself. This requires flipping order of generating pull and push messages: https://github.com/solana-labs/solana/blob/cf779c63c/core/src/cluster_info.rs#L1757-L1762 Both above bugs cause redundant traffic and bandwidth waste in gossip pull-responses.	2021-05-03 16:00:17 +00:00
Jeff Washington (jwash)	541aa5ad85	tests: lamports -> lamports() (#16982 )	2021-05-03 10:45:54 -05:00
Justin Starry	8e561354d5	Improve readability of vote lockout processing (#16987 ) * Improve readability of vote lockout processing * clippy * simplify comment * feedback	2021-05-02 08:36:06 +00:00
carllin	5981399612	Distinguish max replayed and max observed vote (#16936 )	2021-04-29 14:43:28 -07:00
Michael Vines	542d88929f	Add getBlockProduction RPC method	2021-04-28 20:02:54 -07:00
carllin	b5d30846d6	Retry latest vote if expired (#16735 )	2021-04-28 11:46:16 -07:00
behzad nouri	25054bfd35	retains peer's contact-info when making pull requests (#16715 ) ClusterInfo::new_pull_requests has to lookup contact-infos: https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/cluster_info.rs#L1663-L1673 when it was already available when making pull requests: https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/crds_gossip_pull.rs#L232	2021-04-28 13:19:12 +00:00
behzad nouri	1ac2a8cfa5	removes delayed crds inserts when upserting gossip table (#16806 ) It is crucial that VersionedCrdsValue::insert_timestamp does not go backward in time: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79 Otherwise methods such as get_votes and get_epoch_slots_since will break, which will break their downstream flow, including vote-listener and optimistic confirmation: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 For that, Crds::new_versioned is intended to be called "atomically" with Crds::insert_verioned (as the comment already says so): https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129 However, currently this is violated in the code. For example, filter_pull_responses creates VersionedCrdsValues (with the current timestamp), then acquires an exclusive lock on gossip, then process_pull_responses writes those values to the crds table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392 Depending on the workload and lock contention, the insert_timestamps may well be in the past when these values finally are inserted into gossip. To avoid such scenarios, this commit: * removes Crds::new_versioned and Crd::insert_versioned. * makes VersionedCrdsValue constructor private, only invoked in Crds::insert, so that insert_timestamp is populated right before insert. This will improve insert_timestamp monotonicity as long as Crds::insert is not called with a stalled timestamp. Following commits may further improve this by calling timestamp() inside Crds::insert, and/or switching to std::time::Instant which guarantees monotonicity.	2021-04-28 11:56:13 +00:00
behzad nouri	b17d5eeaee	moves cluster-info metrics to a separate module (#16883 )	2021-04-28 02:04:49 +00:00
behzad nouri	b468ead1b1	uses current timestamp when flushing local pending push queue (#16808 ) local_message_pending_push_queue is recording timestamps at the time the value is created, and uses that when the pending values are flushed: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L321 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds_gossip.rs#L96-L102 which is then used as the insert_timestamp when inserting values in the crds table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds_gossip_push.rs#L183 The flushing may happen 100ms after the values are created (or even later if there is a lock contention). This will cause non-monotone insert_timestamps in the crds table (where time goes backward), hindering the usability of insert_timestamps for other computations. For example both ClusterInfo::get_votes and get_epoch_slots_since rely on monotone insert_timestamps when values are inserted into the table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 This commit removes timestamps from local_message_pending_push_queue and uses current timestamp when flushing the queue.	2021-04-28 00:15:11 +00:00
steviez	bc31378797	Trim extra shred bytes in blockstore (#16602 ) Strip the zero-padding off of data shreds before insertion into blockstore Co-authored-by: Stephen Akridge <sakridge@gmail.com> Co-authored-by: Nathan Hawkins <utsl@utsl.org>	2021-04-27 17:40:41 -05:00
behzad nouri	3b8d6b59fb	records hash of values purged by expired pull-responses (#16800 ) process_pull_responses should record hash of values purged by expired responses (as well as unexpired ones): https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L385-L387 otherwise, these values are not excluded from following pull-requests (from likely different nodes): https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L447-L452 and would waste bandwidth should they be included in subsequent pull-responses.	2021-04-27 12:06:49 +00:00
behzad nouri	0f3ac51cf1	limits to data_header.size when combining shreds' payloads (#16708 ) Shredder::deshred is ignoring data_header.size when combining shreds' payloads: https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/shred.rs#L940-L961 Also adding more sanity checks on the alignment of data shreds indices.	2021-04-27 12:04:44 +00:00
Michael Vines	59fc33635a	Add getVoteAccounts RPC method parameter to restrict results to a single vote account	2021-04-27 04:27:15 +00:00
behzad nouri	9706512115	removes old runtime feature gates in gossip and turbine (#16633 )	2021-04-26 17:12:02 +00:00
Jeff Washington (jwash)	ca14c18998	owner -> owner() (#16782 )	2021-04-23 22:49:47 +00:00
Michael Vines	63436cc2bf	Disable flaky test_poh_service (#16772 )	2021-04-23 12:14:11 -05:00
behzad nouri	2c82f2154d	retains crds values if the origin is still active (#16576 ) Local timestamps are updated for records associated with a pubkey if the origin is still active: https://github.com/solana-labs/solana/blob/c8ed14c64/core/src/crds.rs#L301-L311 However this is done inconsistently on some gossip paths (pull requests and pull responses) but not all (e.g. push messages). Additionally update_record_timestamp is inefficient since there can be ~800 values associated with each pubkey. This commit updates records timestamps only on contact-infos; and, instead utilizes origin's timestamp when purging old values.	2021-04-23 15:14:49 +00:00
behzad nouri	03194145c0	removes first_coding_index from erasure recovery code (#16646 ) first_coding_index is the same as the set_index and is so redundant: https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/blockstore_meta.rs#L49-L60	2021-04-23 12:00:37 +00:00
Justin Starry	75b8434b76	Add TPU client for sending txs to the current leader tpu port (#16736 ) * Add TPU client for sending txs to the current leader tpu port * Update tpu_client.rs	2021-04-23 09:35:12 +08:00

... 7 8 9 10 11 ...

2705 Commits