solana

Author	SHA1	Message	Date
behzad nouri	5e6b00fe98	prioritizes more recent values in pull responses (#17238 ) On the receiving end, the outdated values are discarded, and they will only waste bandwidth: https://github.com/solana-labs/solana/blob/3f0480d06/core/src/crds_gossip_pull.rs#L385-L400 This is also exacerbating validator start, since the entrypoint is returning old values in pull responses, and the validator immediately discards those; resulting in huge delay until the validator obtains contact-info of the entrypoint and is able to adopt shred-version and fully start.	2021-05-21 14:07:46 +00:00
behzad nouri	e8b35a4f7b	bumps up min number of bloom items in gossip pull requests (#17236 ) When a validator starts, it has an (almost) empty crds table and it only sends one pull-request to the entrypoint. The bloom filter in the pull-request targets 10% false rate given the number of items. So, if the `num_items` is very wrong, it makes a very small bloom filter with a very high false rate: https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80 https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48 As a result, it is very unlikely that the validator obtains entrypoint's contact-info in response. This exacerbates how long the validator will loop on: > Waiting to adopt entrypoint shred version https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412 This commit increases the min number of bloom items when making gossip pull requests. Effectively this will break the entrypoint crds table into 64 shards, one pull-request for each, a larger bloom filter for each shard, and increases the chances that the response will include entrypoint's contact-info, which is needed for adopting shred version and validator start.	2021-05-21 13:59:26 +00:00
behzad nouri	e7073ecab1	adds gossip metrics for number of staked nodes (#17330 )	2021-05-19 19:25:21 +00:00
Tao Zhu	0781fe1b4f	Upgrade Rust to 1.52.0 (#17096 ) * Upgrade Rust to 1.52.0 update nightly_version to newly pushed docker image fix clippy lint errors 1.52 comes with grcov 0.8.0, include this version to script * upgrade to Rust 1.52.1 * disabling Serum from downstream projects until it is upgraded to Rust 1.52.1	2021-05-19 09:31:47 -05:00
behzad nouri	0e646d10bb	prunes received-cache only once per unique owner's key (#17039 )	2021-05-13 13:50:16 +00:00
behzad nouri	0aa7824884	retains one node-instance per pubkey (#17187 ) crds table retains up to 32 node-instance values per each pubkey. This is so because if there are multiple running instances of the same node, then we want gossip to propagate node-instance values associated with both instances, therefore the corresponding label/key includes the randomly generated token in addition to the pubkey: https://github.com/solana-labs/solana/blob/9c42a89a4/core/src/crds_value.rs#L448 https://github.com/solana-labs/solana/pull/14037 As a result, the number of such values per pubkey are effectively unbounded, requiring custom mitigations implemented in: https://github.com/solana-labs/solana/pull/14467 but still taking redundant extra memory and bandwidth. This commit instead retains only one node-instance per pubkey by extending crds values override logic. If a crds value is of type node-instance, it will always override an existing one with the same key if it has more recent starting timestamp (not wallclock). As a result, gossip will always propagate the node-instance with more recent timestamp. Since the check_duplicate logic will stop the node with older timestamp, this change should preserve existing functionality.	2021-05-13 13:35:46 +00:00
behzad nouri	fa86a335b0	implements cursor for gossip crds table queries (#16952 ) VersionedCrdsValue.insert_timestamp is used for fetching crds values inserted since last query: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 So it is crucial that insert_timestamp does not go backward in time when new values are inserted into the table. However std::time::SystemTime is not monotonic, or due to workload, lock contention, thread scheduling, etc, ... new values may be inserted with a stalled timestamp way in the past. Additionally, reading system time for the above purpose is inefficient/unnecessary. This commit adds an ordinal index to crds values indicating their insert order. Additionally, it implements a new Cursor type for fetching values inserted since last query.	2021-05-06 14:04:17 +00:00
carllin	bc7e741514	Integrate gossip votes into switching threshold (#16973 )	2021-05-04 00:51:42 -07:00
behzad nouri	7cea2c4466	validates gossip addresses before sending pull-requests IP addresses need to be validated before sending packets to them. This commit, sends a ping packet to nodes before any pull requests. Pull requests are then only sent to the nodes which have responded with the correct hash of their respective ping packet.	2021-05-03 18:21:06 +00:00
behzad nouri	2231017b35	uses Mutex instead of RwLock for ping_cache	2021-05-03 18:21:06 +00:00
behzad nouri	a698e34744	patches local pending push messages processing (#16833 ) process_push_messages writes local pending push messages to the crds table, but it discards the return value: https://github.com/solana-labs/solana/blob/cf779c63c/core/src/crds_gossip.rs#L96-L102 In order to exclude outdated values from the next pull-request, we need to record the hash of values purged/overridden by the local push messages, otherwise pull-responses will return outdated values back to the node: https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L447-L452 Additionally, gossip packets arrive and are processed out of order. So, local pending push messages should be flushed before generating bloom filters for pull-requests, preventing pull-responses returning the same values back to the node itself. This requires flipping order of generating pull and push messages: https://github.com/solana-labs/solana/blob/cf779c63c/core/src/cluster_info.rs#L1757-L1762 Both above bugs cause redundant traffic and bandwidth waste in gossip pull-responses.	2021-05-03 16:00:17 +00:00
carllin	b5d30846d6	Retry latest vote if expired (#16735 )	2021-04-28 11:46:16 -07:00
behzad nouri	25054bfd35	retains peer's contact-info when making pull requests (#16715 ) ClusterInfo::new_pull_requests has to lookup contact-infos: https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/cluster_info.rs#L1663-L1673 when it was already available when making pull requests: https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/crds_gossip_pull.rs#L232	2021-04-28 13:19:12 +00:00
behzad nouri	b17d5eeaee	moves cluster-info metrics to a separate module (#16883 )	2021-04-28 02:04:49 +00:00
behzad nouri	b468ead1b1	uses current timestamp when flushing local pending push queue (#16808 ) local_message_pending_push_queue is recording timestamps at the time the value is created, and uses that when the pending values are flushed: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L321 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds_gossip.rs#L96-L102 which is then used as the insert_timestamp when inserting values in the crds table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds_gossip_push.rs#L183 The flushing may happen 100ms after the values are created (or even later if there is a lock contention). This will cause non-monotone insert_timestamps in the crds table (where time goes backward), hindering the usability of insert_timestamps for other computations. For example both ClusterInfo::get_votes and get_epoch_slots_since rely on monotone insert_timestamps when values are inserted into the table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 This commit removes timestamps from local_message_pending_push_queue and uses current timestamp when flushing the queue.	2021-04-28 00:15:11 +00:00
behzad nouri	9706512115	removes old runtime feature gates in gossip and turbine (#16633 )	2021-04-26 17:12:02 +00:00
behzad nouri	03194145c0	removes first_coding_index from erasure recovery code (#16646 ) first_coding_index is the same as the set_index and is so redundant: https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/blockstore_meta.rs#L49-L60	2021-04-23 12:00:37 +00:00
Michael Vines	a1ef2bd74d	Ignore flaky test_pull_request_time_pruning	2021-04-21 12:07:36 -07:00
behzad nouri	37b8587d4e	expands number of erasure coding shreds in the last batch in slots (#16484 ) Number of parity coding shreds is always less than the number of data shreds in FEC blocks: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719 Data shreds are batched in chunks of 32 shreds each: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714 However the very last batch of data shreds in a slot can be small, in which case the loss rate can be exacerbated. This commit expands the number of coding shreds in the last FEC block in slots to: 64 - number of data shreds; so that FEC blocks are always 64 data and parity coding shreds each. As a consequence of this, the last FEC block has more parity coding shreds than data shreds. So for some shred indices we will have a coding shred but no data shreds. This should not cause any kind of overlapping FEC blocks as in: https://github.com/solana-labs/solana/pull/10095 since this is done only for the very last batch in a slot, and the next slot will reset the shred index.	2021-04-21 12:47:50 +00:00
Tyera Eulberg	0924c2d070	Add port and gossip options to solana-test-validator (#16696 )	2021-04-21 02:40:52 +00:00
behzad nouri	bc90e04e64	uses current local timestamp when recording purged values CrdsGossipPull.purged_values is meant to record recently purged values so that they are excluded from imminent pull requests, until the entire cluster have synced to the updated value: https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L449-L454 However, VersionedCrdsValue.local_timestamp represents the local time when the value was last updated, and given that crds values may have different timeouts based on stake, it does not necessarily represent how recently the value was purged: https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds.rs#L75-L76 As such, recording current local timestamp when purging values is more appropriate. Additionally, purge_purged assumes that the purge_values is sorted in timestamps when draining the old ones; which is not true if those timestamps are VersionedCrdsValue.local_timestamp: https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L563-L571	2021-04-20 11:21:00 +00:00
Michael Vines	c8b474cd0b	Send votes to next leader's TPU instead of our TPU	2021-04-20 00:38:21 -07:00
Michael Vines	b06e93fe5b	Increase test timeout	2021-04-18 20:55:02 -07:00
behzad nouri	e405747409	Revert "Add limit and shrink policy for recycler (#15320 )" This reverts commit `c2e8814dce`.	2021-04-18 19:29:24 +00:00
behzad nouri	d92721aab9	uses timeouts based on stake for filtering pull responses (#16549 ) filter_pull_responses is using default timeout when discarding pull responses (except for ContactInfo): https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L349-L350 But purging code uses timeouts based on stake: https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L1867-L1870 So the crds value will not be purged from the sender's table and will be sent again over the next pull request.	2021-04-14 20:18:00 +00:00
behzad nouri	f35a6a8be0	prioritizes contact-infos in pull responses (#16541 ) Expired crds values where the contact-info does not exist are wasted: https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L353-L378 and then are sent again over the next pull-request. Also, the stake of the first response (which can be anything) is used to weight all pull-responses to a node, while the rest of responses can have different stake. https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L2231	2021-04-14 18:45:20 +00:00
Justin Starry	85eb37fab0	Merge pull request from GHSA-8v47-8c53-wwrc * Track transaction check time separately from account loads * banking packet process metrics * Remove signature clone in status cache lookup * Reduce allocations when converting packets to transactions * Add blake3 hash of transaction messages in status cache * Bug fixes * fix tests and run fmt * Address feedback * fix simd tx entry verification * Fix rebase * Feedback * clean up * Add tests * Remove feature switch and fall back to signature check * Bump programs/bpf Cargo.lock * clippy * nudge benches * Bump `BankSlotDelta` frozen ABI hash` * Add blake3 to sdk/programs/Cargo.lock * nudge bpf tests * short circuit status cache checks Co-authored-by: Trent Nelson <trent@solana.com>	2021-04-13 00:28:08 -06:00
Christian Drappi	54a04bac3d	Apple M1 compatibility (#16346 ) Co-authored-by: Christian Drappi <christiandrappi@Christians-MacBook-Pro.local>	2021-04-09 17:21:01 -07:00
behzad nouri	22a18a68e3	stops consuming pinned vectors with a recycler (#16441 ) If the vector is pinned and has a recycler, From<PinnedVec> implementation of Vec should clone (instead of consuming) the underlying vector so that the next allocation of a PinnedVec will recycle an already pinned one.	2021-04-09 16:55:24 +00:00
Trent Nelson	b71875df61	cluster-info: Get rid of some integer math while we're here	2021-04-06 00:09:37 +00:00
Trent Nelson	b6b08706b9	cluster-info: Don't subtract non-shred spies from node count	2021-04-06 00:09:37 +00:00
behzad nouri	b041b55028	makes test_pull_request_time_pruning smaller (#16128 )	2021-03-25 22:44:43 +00:00
behzad nouri	a6c23648cb	limits CrdsGossipPull::pull_request_time size (#15793 ) There is no pruning logic on CrdsGossipPull::pull_request_time https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_pull.rs#L172-L174 potentially allowing this to take too much memory. Additionally, CrdsGossipPush::last_pushed_to is pruning recent push timestamps: https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_push.rs#L275-L279 instead of the older ones. Co-authored-by: Nathan Hawkins <utsl@utsl.org>	2021-03-24 18:33:56 +00:00
behzad nouri	570fd3f810	makes turbine peer computation consistent between broadcast and retransmit (#14910 ) get_broadcast_peers is using tvu_peers: https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370 which is potentially inconsistent with retransmit_peers: https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345 Also, the leader does not include its own contact-info when broadcasting shreds: https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324 but on the retransmit side, slot leader is removed only _after_ neighbors and children are computed: https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384 So the turbine broadcast tree is different between the two stages. This commit: * Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers consistently. * Retransmit stage removes slot leader _before_ computing children and neighbors.	2021-03-24 13:34:48 +00:00
behzad nouri	f2865dfd63	requires stakes for propagating crds values through gossip (#15561 )	2021-03-12 15:50:14 +00:00
behzad nouri	56923c91bf	limits number of unique pubkeys in the crds table (#15539 )	2021-03-10 20:46:05 +00:00
behzad nouri	5a9896706c	indexes epoch slots in crds table (#15459 ) ClusterInfo::get_epoch_slots_since scans the entire crds table to obtain epoch-slots inserted since a timestamp: https://github.com/solana-labs/solana/blob/013daa8f4/core/src/cluster_info.rs#L1245-L1262 The alternative is to index epoch-slots in crds table ordered by their insert timestamp.	2021-02-26 14:12:04 +00:00
carllin	c2e8814dce	Add limit and shrink policy for recycler (#15320 )	2021-02-24 00:15:58 -08:00
Michael Vines	5df36aec7d	Pacify clippy	2021-02-19 20:08:41 -08:00
behzad nouri	aa3aac766f	adds metrics for inbound/outbound gossip packets counts (#15407 )	2021-02-19 22:49:35 +00:00
behzad nouri	076c20f1ca	checks that prune-messages have the same inner/outer pubkey (#15352 )	2021-02-16 21:06:18 +00:00
behzad nouri	0ad063f4e9	adds flag to disable duplicate instance check (#15006 )	2021-02-03 16:26:17 +00:00
dependabot[bot]	1df93fa2be	chore: bump serde from 1.0.112 to 1.0.118 (#14828 ) * chore: bump serde from 1.0.112 to 1.0.122 Bumps [serde](https://github.com/serde-rs/serde) from 1.0.112 to 1.0.122. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.122) Signed-off-by: dependabot[bot] <support@github.com> * [auto-commit] Update all Cargo lock files * Update frozen_abi digest following serde update * Revert "chore: bump serde from 1.0.112 to 1.0.122" This reverts commit `a3ef4442a4`. * Revert "[auto-commit] Update all Cargo lock files" This reverts commit `c41c3b005f`. * chore: bump serde from 1.0.112 to 1.0.118 Bumps [serde](https://github.com/serde-rs/serde) from 1.0.112 to 1.0.118. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.118) Signed-off-by: dependabot[bot] <support@github.com> * [auto-commit] Update all Cargo lock files * Remove serum-dex pinning * blind commit! Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot-buildkite <dependabot-buildkite@noreply.solana.com> Co-authored-by: Ryo Onodera <ryoqun@gmail.com>	2021-02-02 23:28:16 +09:00
behzad nouri	e1021d9f83	removes redundant epoch stakes cache in retransmit (#14781 ) Following `d6d76219b`, staked nodes computed from vote accounts are already cached in runtime::Stakes, so the caching in retransmit_stage is redundant.	2021-01-24 21:15:09 +00:00
behzad nouri	491b059755	broadcasts duplicate shreds through gossip (#14699 )	2021-01-24 15:47:43 +00:00
behzad nouri	8e581601d6	patches crds vote-index assignment bug (#14438 ) If tower is full, old votes are evicted from the front of the deque: https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373 whereas recent votes if expire are evicted from the back: https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537 As a result, from a single tower_index scalar, we cannot infer which crds-vote should be overwritten: https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576 In addition there is an off by one bug in the existing code. tower_index is bounded by MAX_LOCKOUT_HISTORY - 1: https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382 So, it is at most 30, whereas MAX_VOTES is 32: https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29 Which means that this branch is never taken: https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593 so crds table alwasys keeps 29 oldest votes by wallclock, and then only overrides the 30st one each time. (i.e a tally of only two most recent votes).	2021-01-21 13:08:07 +00:00
behzad nouri	b5fd0ed859	rewrites turbine retransmit peers computation (#14584 )	2021-01-19 04:18:47 +00:00
Michael Vines	9ddd6f08e8	Persist gossip contact info	2020-12-27 20:46:54 -08:00
behzad nouri	2fd38d9912	indexes votes in crds table (#14272 )	2020-12-27 13:31:05 +00:00
behzad nouri	49019c6613	obtains staked-nodes from the root-bank (#14257 ) ... as opposed to the working bank	2020-12-27 13:28:05 +00:00

1 2 3 4 5 ...

318 Commits