140 Commits

Author SHA1 Message Date
Tyera Eulberg
e9bef42502 Bump version to v1.7.18 (#21011) 2021-10-26 23:02:09 -06:00
mergify[bot]
b216e3c9f7 Add CrdsData::IncrementalSnapshotHashes (backport #20374) (#20983)
* Add CrdsData::IncrementalSnapshotHashes (#20374)

(cherry picked from commit 4e3818e5c1)

# Conflicts:
#	gossip/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: Brooks Prumo <brooks@solana.com>
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-10-26 17:24:39 +00:00
Justin Starry
e3e1396c1d Bump version to v1.7.17 (#20896) 2021-10-23 09:48:15 -04:00
mergify[bot]
9ff9234280 adds metrics for number of nodes vs number of pubkeys (backport #20512) (#20523)
* adds metrics for number of nodes vs number of pubkeys (#20512)

(cherry picked from commit 0da661de62)

# Conflicts:
#	gossip/src/cluster_info_metrics.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-10-08 15:20:27 +00:00
Trent Nelson
33c24ec3ae Bump version to 1.7.16 (#20519) 2021-10-07 18:29:40 -06:00
Tyera Eulberg
734b380cdb Bump version to v1.7.15 (#20338) 2021-09-30 10:51:34 -06:00
sakridge
fec15f69f4 Increment 1.7 version (#20316) 2021-09-29 15:37:45 -04:00
sakridge
257ddbeee1 Tpu vote 1.7 (#20187)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>
2021-09-29 18:12:58 +02:00
mergify[bot]
4c4f183515 reverts #17542 (#20259) (#20273)
https://github.com/solana-labs/solana/pull/17542
excludes caller's crds values from pull responses.

Reverting that commit so that when a (staked) node restarts, it can
obtain its crds values before restart from other nodes.

(cherry picked from commit 43ed727ba7)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 12:54:31 +00:00
mergify[bot]
a1a0c63862 retransmits shreds recovered from erasure codes (backport #19233) (#20249)
* removes packet-count metrics from retransmit stage

Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.

(cherry picked from commit bf437b0336)

# Conflicts:
#	core/src/retransmit_stage.rs

* adds packet/shred count stats to window-service

Adding back these metrics from the earlier commit which removed them
from retransmit stage.

(cherry picked from commit 8198a7eae1)

* removes erroneous uses of Arc<...> from retransmit stage

(cherry picked from commit 6e413331b5)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs

* sends shreds (instead of packets) to retransmit stage

Working towards channelling through shreds recovered from erasure codes
to retransmit stage.

(cherry picked from commit 3efccbffab)

# Conflicts:
#	core/src/retransmit_stage.rs

* returns completed-data-set-info from insert_data_shred

instead of opaque (u32, u32) which are then converted to
CompletedDataSetInfo at the call-site.

(cherry picked from commit 3c71670bd9)

# Conflicts:
#	ledger/src/blockstore.rs

* retransmits shreds recovered from erasure codes

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.

(cherry picked from commit 7a8807b8bb)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-27 18:11:37 +00:00
mergify[bot]
e9a993fb59 allows sendmmsg api taking owned values (as well as references) (#18999) (#20226)
Current signature of api in sendmmsg requires a slice of inner
references:
https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152

That forces the call-site to convert owned values to references even
though doing so is redundant and adds an extra level of indirection:
https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291

This commit expands the api using AsRef and Borrow traits to allow
calling the method with owned values (as well as references like
before).

(cherry picked from commit 049fb0417f)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 20:35:13 +00:00
mergify[bot]
aacb5e58ad sendmmsg cleanup #18589 (#20175)
Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.

(cherry picked from commit ae5ad5cf9b)

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
2021-09-25 23:00:00 +00:00
mergify[bot]
e50a26a493 passes through --allow-private-addr to validators in system perf tests (backport #18876) (#20180)
* passes through --allow-private-addr to validators in system perf tests (#18876)

(cherry picked from commit 81026f9ea5)

# Conflicts:
#	multinode-demo/bootstrap-validator.sh
#	multinode-demo/validator.sh
#	net/net.sh

* removes backport merge conflicts

* ignores RUSTSEC-2021-0115

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-25 20:08:32 +00:00
mergify[bot]
99c74c8902 Limit transaction forwarding from banking_stage (#19940) (#20081)
(cherry picked from commit 013e1d9d49)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-09-24 18:04:22 +00:00
sakridge
70d556782b Bump 1.7 version (#19943) 2021-09-16 13:16:09 -06:00
behzad nouri
7ac0ea0885 filters for recent contact-infos when checking for live stake (#19204)
Contact-infos are saved to disk:
https://github.com/solana-labs/solana/blob/9dfeee299/gossip/src/cluster_info.rs#L1678-L1683

and restored on validator start-up:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L450

Staked nodes entries will not expire until an epoch after. So when the
validator checks for online stake it is erroneously picking up
contact-infos restored from disk, which breaks the entire
wait-for-supermajority logic:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L1515-L1561

This commit adds an extra check for the age of contact-info entries and
filters out old ones.

(cherry picked from commit 7a789e0763)

# Conflicts:
#	core/src/validator.rs
2021-09-09 23:33:29 -07:00
mergify[bot]
5fbcb10e6f adds logs when push-vote panics with invalid vote-index (#19485) (#19521)
In order to debug this panic on the clusters:

  panicked at 'assertion failed: (vote_index as usize) <
  MAX_LOCKOUT_HISTORY', core/src/cluster_info.rs:1012:9

(cherry picked from commit d7051b0d21)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-08-31 15:35:33 +00:00
Tyera Eulberg
f73a61d2ec Bump version to 1.7.12 2021-08-27 16:24:24 +00:00
Trent Nelson
ab5d032634 Bump version to v1.7.11 2021-08-12 06:55:18 +00:00
Trent Nelson
b7f1f19d8e Bump version to v1.7.10 2021-07-31 01:19:33 -06:00
mergify[bot]
eacc69efba adds validator flag to allow private ip addresses (backport #18850) (#18975)
* adds validator flag to allow private ip addresses (#18850)

(cherry picked from commit d2d5f36a3c)

# Conflicts:
#	accounts-cluster-bench/Cargo.toml
#	bench-tps/Cargo.toml
#	cli/Cargo.toml
#	core/benches/cluster_info.rs
#	core/src/banking_stage.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs
#	core/src/cluster_slots_service.rs
#	core/src/repair_service.rs
#	core/src/tvu.rs
#	core/src/validator.rs
#	dos/Cargo.toml
#	gossip/src/cluster_info.rs
#	gossip/src/crds_gossip_pull.rs
#	gossip/src/crds_gossip_push.rs
#	gossip/src/gossip_service.rs
#	local-cluster/Cargo.toml
#	local-cluster/src/cluster_tests.rs
#	local-cluster/tests/local_cluster.rs
#	rpc/Cargo.toml
#	rpc/src/rpc.rs
#	tokens/Cargo.toml
#	validator/Cargo.toml
#	validator/src/main.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-29 21:43:24 +00:00
mergify[bot]
49b0d1792b filters crds values in parallel when responding to gossip pull-requests (backport #18877) (#18901)
* filters crds values in parallel when responding to gossip pull-requests (#18877)

When responding to gossip pull-requests, filter_crds_values takes a lot of time
while holding onto read-lock:
https://github.com/solana-labs/solana/blob/f51d64868/gossip/src/crds_gossip_pull.rs#L509-L566

This commit will filter-crds-values in parallel using rayon thread-pools.

(cherry picked from commit f1198fc6d5)

# Conflicts:
#	gossip/src/cluster_info.rs
#	gossip/src/crds_gossip_pull.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-26 21:30:24 +00:00
Michael Vines
548ddff7ed Bump version to v1.7.9 2021-07-24 11:23:44 -06:00
Ryo Onodera
1cc8de0fed Bump version to v1.7.8 (#18866) 2021-07-24 01:14:03 +09:00
Trent Nelson
19049ca91b Bump version to v1.7.7 2021-07-17 08:42:22 +00:00
mergify[bot]
df9061b933 excludes private ip addresses (#18740)
(cherry picked from commit e316586516)

# Conflicts:
#	core/src/broadcast_stage.rs
#	gossip/src/cluster_info.rs

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-17 04:10:43 +00:00
mergify[bot]
4145c629c0 removes id from push_lowest_slot args (backport #18645) (#18649)
* removes id from push_lowest_slot args (#18645)

push_lowest_slot cannot sign the new crds-value unless the id (pubkey)
argument passed-in is the same pubkey as in ClusterInfo::keypair(), in
which case the id argument is redundant:
https://github.com/solana-labs/solana/blob/bb41cf346/gossip/src/cluster_info.rs#L824-L845

Additionally, the lookup is done with self.id(), but insert is done with
the id argument, which is logically a bug.

(cherry picked from commit c90af3cd63)

# Conflicts:
#	gossip/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-16 18:31:54 +00:00
sakridge
551dc0a74c Bump 1.7 version (#18723) 2021-07-16 09:43:18 -06:00
mergify[bot]
6abd5fbc3e removes redundant (mutable) self receivers (backport #18574) (#18580)
* removes redundant (mutable) self receivers (#18574)

(cherry picked from commit 918b5c28b2)

# Conflicts:
#	gossip/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-11 01:02:20 +00:00
mergify[bot]
9900c1ad8a adds a generic implementation of Gossip{Read,Write}Lock (#18559) (#18569)
(cherry picked from commit fd9c10c2e2)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-10 15:51:29 +00:00
mergify[bot]
5d0e1d62d5 removes id and shred_version from CrdsGossip (backport #18505) (#18554)
* removes id and shred_version from CrdsGossip (#18505)

ClusterInfo is the gateway to CrdsGossip function calls, and it already
has node's pubkey and shred version (full ContactInfo and Keypair in
fact).
Duplicating these data in CrdsGossip adds redundancy and possibility for
bugs should they not be consistent with ClusterInfo.

(cherry picked from commit 4e1333fbe6)

# Conflicts:
#	gossip/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-09 15:37:05 +00:00
mergify[bot]
7e613c7a78 skips process_push_message for local messages (backport #18493) (#18532)
* skips process_push_message for local messages (#18493)

received_cache is not relevant for local messages, and does not need to
be updated:
https://github.com/solana-labs/solana/blob/92c5cdab6/gossip/src/crds_gossip_push.rs#L166-L189

(cherry picked from commit 27cc7577a1)

# Conflicts:
#	gossip/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-09 13:33:40 +00:00
Justin Starry
2a73ba2fbb Fix cargo check (#18497) 2021-07-07 14:01:19 -05:00
mergify[bot]
d668a7694f implements an unbiased weighted shuffle using binary indexed tree (#18343) (#18485)
Current implementation of weighted_shuffle:
https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37
uses a heuristic which results in biased samples.

For example, if the weights are [1, 10, 100], then the 3rd index should
come first 100 times more often than the 1st index. However,
weighted_shuffle is picking the 3rd index 200+ times more often than the
1st index, showing a disproportional bias in favor of higher weights.

This commit implements weighted shuffle using binary indexed tree to
maintain cumulative sum of weights while sampling. The resulting samples
are demonstrably unbiased and precisely proportional to the weights.

Additionally the iterator interface allows to skip computations when
not all indices are processed.

Of the use cases of weighted_shuffle, changing turbine code requires
feature-gating to keep the cluster in sync. That is not updated in
this commit, but can be done together with future updates to turbine.

(cherry picked from commit dba42c57b4)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-07 16:07:41 +00:00
mergify[bot]
c534c928a7 encapsulates turbine peers computations of broadcast & retransmit stages (#18238) (#18464)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.

(cherry picked from commit 04787be8b1)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-07 14:30:55 +00:00
mergify[bot]
daf2c3c155 Consider all peers as potential candidates during pull-request in case of offline nodes (#18333) (#18372)
* Try all peers during pull-request in case of offline nodes

* fix clippy err

(cherry picked from commit f4fb5de545)

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
2021-07-01 22:56:20 +00:00
Trent Nelson
4466aa39c4 Bump version to v1.7.5 2021-06-30 22:55:01 -06:00
mergify[bot]
287daa9248 debug logs when crds table trim failed (#18307) (#18309)
reports of this error being possibly spammy:
https://discord.com/channels/428295358100013066/689412830075551748/859441080054710293

The commit changes the log level to debug.
Additionally adding a new metric to understand the frequency of this error.

(cherry picked from commit 9d983a34a0)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-29 22:43:33 +00:00
mergify[bot]
677664e71d filters crds values obtained through gossip by their shred version (backport #18072) (#18175)
* filters crds values obtained through gossip by their shred version (#18072)

filter_by_shred_version does not check the shred-version of the owner of
the crds-value. It only checks the shred-version of the node which is
relaying the value:
https://github.com/solana-labs/solana/blob/5cc073420/gossip/src/cluster_info.rs#L2274-L2289

So crds-values with different shred versions can still pass through this
function as long as they are relayed by a node with matching shred
version; and so, a single node can bridge different shred values
through-out the cluster.

(cherry picked from commit 69a5f0e6cd)

# Conflicts:
#	gossip/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-23 17:16:33 +00:00
Trent Nelson
597429ab3e Bump version to v1.7.4 2021-06-22 19:57:34 +00:00
mergify[bot]
363b75619f adds shred-version to ip-echo-server response (backport #18066) (#18113)
* adds shred-version to ip-echo-server response

When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417

Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.

In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.

(cherry picked from commit 598093b5db)

# Conflicts:
#	Cargo.lock
#	net-utils/Cargo.toml
#	programs/bpf/Cargo.lock

* removes backport merge conflicts

* obtains shred-version from entrypoint's ip-echo-server in validator-main

(cherry picked from commit 58e115275a)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-21 23:04:08 +00:00
mergify[bot]
0e7512a225 Fix Nightly Clippy Warnings (backport #18065) (#18070)
* chore: cargo +nightly clippy --fix -Z unstable-options

(cherry picked from commit 6514096a67)

# Conflicts:
#	core/src/banking_stage.rs
#	core/src/cost_model.rs
#	core/src/cost_tracker.rs
#	core/src/execute_cost_table.rs
#	core/src/replay_stage.rs
#	core/src/tvu.rs
#	ledger-tool/src/main.rs
#	programs/bpf_loader/build.rs
#	rbpf-cli/src/main.rs
#	sdk/cargo-build-bpf/src/main.rs
#	sdk/cargo-test-bpf/src/main.rs
#	sdk/src/secp256k1_instruction.rs

* chore: cargo fmt

(cherry picked from commit 789f33e8db)

* Updates BPF program assert_instruction_count tests.

(cherry picked from commit c1e03f3410)

# Conflicts:
#	programs/bpf/tests/programs.rs

* Resolve conflicts

Co-authored-by: Alexander Meißner <AlexanderMeissner@gmx.net>
Co-authored-by: Michael Vines <mvines@gmail.com>
2021-06-18 20:02:48 +00:00
mergify[bot]
c330016109 adds mapping from nodes pubkeys to their shred-version (#17940) (#18067)
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
https://github.com/solana-labs/solana/pull/17899

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.

(cherry picked from commit 5a99fa3790)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-18 18:02:23 +00:00
Stephen Akridge
d159ae9342 Bump version to v1.7.3 2021-06-17 15:34:50 -06:00
mergify[bot]
e2e41a29eb Don't use pinned memory when unnecessary (#17832) (#17934)
Reports of excessive GPU memory usage and errors
from cudaHostRegister. There are some cases where pinning is
not required.

(cherry picked from commit eeee75c5be)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-06-14 16:30:51 +00:00
mergify[bot]
16b1a4d003 short cuts expiration check if origin's contact-info is still valid (#17918) (#17921)
Crds::find_old_labels can skip checking values timestamps if the
origin's contact info hasn't expired yet:
https://github.com/solana-labs/solana/blob/985280ec0/gossip/src/crds.rs#L394-L408

(cherry picked from commit cca46308bc)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-13 21:12:24 +00:00
mergify[bot]
b51ea3ca0c excludes epoch-slots from nodes with unknown or different shred version (#17899) (#17916)
Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
https://github.com/solana-labs/solana/issues/17789
https://github.com/solana-labs/solana/issues/14366#issuecomment-769896036
https://github.com/solana-labs/solana/issues/14366#issuecomment-832754654

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.

(cherry picked from commit 985280ec0b)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-13 15:56:05 +00:00
Ryo Onodera
48e565038a Bump version to v1.7.2 (#17831) 2021-06-08 10:29:39 +00:00
mergify[bot]
3d5f333a3b parallelizes gossip packets receiver with processing of requests (#17647) (#17807)
Gossip packet processing is composed of two stages:
  * The first is consuming packets from the socket, deserializing,
    sanitizing and verifying them:
    https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2510-L2521
  * The second is actually processing the requests/messages:
    https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2585-L2605

The former does not acquire any locks and so can be parallelized with
the later, allowing better pipelineing properties and smaller latency in
responding to gossip requests or propagating messages.

(cherry picked from commit cab30e2356)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-07 20:30:31 +00:00
mergify[bot]
0de1ce0c2c adds fallback logic if retransmit multicast fails (#17714) (#17742)
In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733

(cherry picked from commit be957f25c9)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-04 16:20:44 +00:00