Commit Graph

302 Commits

Author SHA1 Message Date
mergify[bot]
1be989c5d2 adds fallback logic if retransmit multicast fails (backport #17714) (#18498)
* adds fallback logic if retransmit multicast fails (#17714)

In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733

(cherry picked from commit be957f25c9)

# Conflicts:
#	core/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-07 22:36:32 +00:00
mergify[bot]
24ee0b3934 Avoid full-range compactions with periodic filtered b.g. ones (backport #16697) (#17956)
* Avoid full-range compactions with periodic filtered b.g. ones (#16697)

* Update rocksdb to v0.16.0

* Promote the infrequent and important log to info!

* Force background compaction by ttl without manual compaction

* Fix test

* Support no compaction mode in test_ledger_cleanup_compaction

* Fix comment

* Make compaction_interval customizable

* Avoid major compaction with periodic filtering...

* Adress lazy_static, special cfs and range check

* Clean up a bit and add comment

* Add comment

* More comments...

* Config code cleanup

* Add comment

* Use .conflicts_with()

* Nullify unneeded delete_range ops for special CFs

* Some clean ups

* Clarify the locking intention

* Ensure special CFs' consistency with PurgeType::CompactionFilter

* Fix comment

* Fix bad copy paste

* Fix various types...

* Don't use tuples

* Add a unit test for compaction_filter

* Fix typo...

* Remove flag and just use new behavior always

* Fix wrong condition negation...

* Doc. about no set_last_purged_slot in purge_slots

* Write a test and fix off-by-one bug....

* Apply suggestions from code review

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Follow up to github review suggestions

* Fix line-wrapping

* Fix conflict

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
(cherry picked from commit 1f97b2365f)

# Conflicts:
#	Cargo.lock
#	ledger/src/blockstore_db.rs

* Fix conflicts

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2021-06-15 08:49:13 +00:00
mergify[bot]
b7dc7d859c removes Crds::lookup and lookup_versioned (backport #17438) (#17441)
* removes Crds::lookup and lookup_versioned (#17438)

(cherry picked from commit e867d7f3b8)

* patches push_epoch_slots for backport

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-24 20:10:44 +00:00
mergify[bot]
0db23fee53 encapsulates purged values bookkeeping into crds module (#17265) (#17436)
For all code paths (gossip push, pull, purge, etc) that remove or
override a crds value, it is necessary to record hash of values purged
from crds table, in order to exclude them from subsequent pull-requests;
otherwise the next pull request will likely return outdated values,
wasting bandwidth:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491

Currently this is done all over the place in multiple modules, and this
has caused bugs in the past where purged values were not recorded.

This commit encapsulated this bookkeeping into crds module, so that any
code path which removes or overrides a crds value, also records the hash
of purged value in-place.

(cherry picked from commit 9d112cf41f)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-24 15:04:24 +00:00
mergify[bot]
b08c0caefe adds metric for turbine retransmit tree mismatch (backport #17351) (#17392)
* adds metric for turbine retransmit tree mismatch

In order to remove port-based forwarding logic in turbine, we need to
first track how often the turbine retransmit/broadcast trees mismatch
across nodes.
One consistency condition is that if the node is on the critical path
(i.e. the first node in each neighborhood), then we expect that the
packet arrives at tvu socket as opposed to tvu-forwards.

This commit adds a metric to track how often above condition is not met.

(cherry picked from commit 71de021177)

* removes the nested for loop from retransmit-stage

The code can be simplified by just flattening the vector of packets.

(cherry picked from commit ff0e623d30)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-21 20:08:12 +00:00
mergify[bot]
25333abd96 extends crds values timeouts if stakes are unknown (#17261) (#17389)
If stakes are unknown, then timeouts will be short, resulting in values
being purged from the crds table, and consequently higher pull-response
load when they are obtained again from gossip. In particular, this slows
down validator start where almost all values obtained from entrypoint
are immediately discarded.

(cherry picked from commit 2adce67260)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-21 17:29:37 +00:00
mergify[bot]
d2e98cb531 prunes received-cache only once per unique owner's key (#17039) (#17337)
(cherry picked from commit 0e646d10bb)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-19 22:50:42 +00:00
mergify[bot]
733ef4b0b8 type AccountSecondaryIndexes = HashSet (backport #17108) (#17149)
* type AccountSecondaryIndexes = HashSet (#17108)

(cherry picked from commit f39dda00e0)

# Conflicts:
#	runtime/benches/accounts.rs
#	runtime/src/accounts.rs
#	runtime/src/accounts_db.rs
#	runtime/src/accounts_index.rs

* resolve merge errors

Co-authored-by: Jeff Washington (jwash) <75863576+jeffwashington@users.noreply.github.com>
Co-authored-by: Jeff Washington (jwash) <wash678@gmail.com>
2021-05-10 20:55:33 +00:00
mergify[bot]
094271be7d indexes crds values by their insert order (backport #16809) (#17132)
* indexes crds values by their insert order

(cherry picked from commit dfa3e7a61c)

* reads gossip push messages off crds ordinal index

Having an ordinal index on crds values based on insert order allows to
efficiently filter values using a cursor. In particular
CrdsGossipPush::push_messages hash-map can be replaced with a cursor,
saving on the bookkeepings, purging, etc

(cherry picked from commit 22c02b917e)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-10 00:00:00 +00:00
mergify[bot]
330f42c375 implements cursor for gossip crds table queries (#16952) (#17084)
VersionedCrdsValue.insert_timestamp is used for fetching crds values
inserted since last query:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298

So it is crucial that insert_timestamp does not go backward in time when
new values are inserted into the table. However std::time::SystemTime is
not monotonic, or due to workload, lock contention, thread scheduling,
etc, ... new values may be inserted with a stalled timestamp way in the
past. Additionally, reading system time for the above purpose is
inefficient/unnecessary.

This commit adds an ordinal index to crds values indicating their insert
order. Additionally, it implements a new Cursor type for fetching values
inserted since last query.

(cherry picked from commit fa86a335b0)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-06 15:23:01 +00:00
mergify[bot]
834c96a374 validates gossip addresses before sending pull-requests (backport #16748) (#17009)
* uses Mutex instead of RwLock for ping_cache

(cherry picked from commit 2231017b35)

* validates gossip addresses before sending pull-requests

IP addresses need to be validated before sending packets to them.
This commit, sends a ping packet to nodes before any pull requests.
Pull requests are then only sent to the nodes which have responded with
the correct hash of their respective ping packet.

(cherry picked from commit 7cea2c4466)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-05-03 19:40:02 +00:00
mergify[bot]
25aee12502 retains peer's contact-info when making pull requests (#16715) (#16907)
ClusterInfo::new_pull_requests has to lookup contact-infos:
https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/cluster_info.rs#L1663-L1673

when it was already available when making pull requests:
https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/crds_gossip_pull.rs#L232

(cherry picked from commit 25054bfd35)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-04-28 14:54:54 +00:00
mergify[bot]
d203bd1998 Add TPU client for sending txs to the current leader tpu port (#16736) (#16762)
* Add TPU client for sending txs to the current leader tpu port

* Update tpu_client.rs

(cherry picked from commit 75b8434b76)

Co-authored-by: Justin Starry <justin@solana.com>
2021-04-23 02:50:30 +00:00
mergify[bot]
558a46f5d5 RPC: use finalized as default pubsub commitment level (#16659) (#16666)
* RPC: use finalized as default pubsub commitment level

* update docs

* Fix tests

(cherry picked from commit a7e65c0034)

Co-authored-by: Justin Starry <justin@solana.com>
2021-04-20 09:47:50 +00:00
mergify[bot]
b3488e0139 Cli: move airdrop to rpc requests (#16557) (#16564)
* Add recent_blockhash to requestAirdrop

* Move tx confirmation to separate method

* Add RpcClient airdrop methods

* Request cli airdrop via RpcClient

* Pass optional faucet_addr into TestValidator and fix tests

* Update client/src/rpc_client.rs

Co-authored-by: Michael Vines <mvines@gmail.com>

Co-authored-by: Michael Vines <mvines@gmail.com>
(cherry picked from commit 7dfb51c0b4)

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-04-15 07:35:19 +00:00
mergify[bot]
7475a6f444 makes turbine peer computation consistent between broadcast and retransmit (#14910) (#16143)
get_broadcast_peers is using tvu_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370
which is potentially inconsistent with retransmit_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345

Also, the leader does not include its own contact-info when broadcasting
shreds:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324
but on the retransmit side, slot leader is removed only _after_ neighbors and
children are computed:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384
So the turbine broadcast tree is different between the two stages.

This commit:
* Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers
  consistently.
* Retransmit stage removes slot leader _before_ computing children and
  neighbors.

(cherry picked from commit 570fd3f810)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-03-26 00:16:48 +00:00
behzad nouri
f2865dfd63 requires stakes for propagating crds values through gossip (#15561) 2021-03-12 15:50:14 +00:00
Justin Starry
918d04e3f0 Add more slot update notifications (#15734)
* Add more slot update notifications

* fix merge

* Address feedback and add integration test

* switch to datapoint

* remove unused shred method

* fix clippy

* new thread for rpc completed slots

* remove extra constant

* fixes

* rely on channel closing

* fix check
2021-03-12 21:44:06 +08:00
Michael Vines
5df36aec7d Pacify clippy 2021-02-19 20:08:41 -08:00
Trent Nelson
7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00
sakridge
b24cb9840e Speedup ledger cleanup test (#15304)
Just clone to produce shreds and use a separate insert thread.
2021-02-17 08:59:25 -08:00
Jeff Washington (jwash)
ba02452d75 add validator flag no-accounts-db-index-hashing (#15350)
* add validator flag no_accounts_db_index_hashing

* add validator flag no_accounts_db_index_hashing
2021-02-16 21:13:48 +00:00
sakridge
5b8f046c67 More configurable rocksdb compaction (#15213)
rocksdb compaction can cause long stalls, so
make it more configurable to try and reduce those stalls
and also to coordinate between multiple nodes to not induce
stall at the same time.
2021-02-14 10:16:30 -08:00
carllin
990bb426a9 Fix flaky test test_concurrent_snapshot_packaging (#15252) 2021-02-11 16:03:51 -08:00
Jeff Washington (jwash)
fabecdc86c use thread pool for non-index hash calculations (#15149) 2021-02-05 19:48:55 +00:00
behzad nouri
6fd5ec0e4c caches descendants in bank forks (#15107) 2021-02-05 18:00:45 +00:00
Tyera Eulberg
d1563f0ccd Bump tonic, prost, tarpc, tokio (#15013)
* Update tonic & prost, and regenerate proto

* Reignore doc code

* Revert pull #14367, but pin tokio to v0.2 for jsonrpc

* Bump backoff and goauth -> and therefore tokio

* Bump tokio in faucet, net-utils

* Bump remaining tokio, plus tarpc
2021-02-05 00:21:53 -07:00
Jeff Washington (jwash)
600ff0d915 calculate hash from store instead of index (#15034)
* calculate hash from store instead of index

* restore update hash in abs
2021-02-04 09:00:33 -06:00
behzad nouri
0ad063f4e9 adds flag to disable duplicate instance check (#15006) 2021-02-03 16:26:17 +00:00
Tyera Eulberg
98aa1fa4ea Upgrade jsonrpc crates to v17.0.0 (#15018)
* Upgrade to jsonrpc 17.0.0

* Fix test

* tree

Co-authored-by: Michael Vines <mvines@gmail.com>
2021-02-02 19:53:08 -07:00
Tom Parker-Shemilt
01230a0105 Remove serial_test_derive dependency (#14891) 2021-01-28 22:35:31 -07:00
Tyera Eulberg
ffa5c7dcc8 Deprecate commitment variants (#14797)
* Deprecate commitment variants

* Add new CommitmentConfig builders

* Add helpers to avoid allowing deprecated variants

* Remove deprecated transaction-status code

* Include new commitment variants in runtime commitment; allow deprecated as long as old variants persist

* Remove deprecated banks code

* Remove deprecated variants in core; allow deprecated in rpc/rpc-subscriptions for now

* Heavier hand with rpc/rpc-subscription commitment

* Remove deprecated variants from local-cluster

* Remove deprecated variants from various tools

* Remove deprecated variants from validator

* Update docs

* Remove deprecated client code

* Add new variants to cli; remove deprecated variants as possible

* Don't send new commitment variants to old clusters

* Retain deprecated method in test_validator_saves_tower

* Fix clippy matches! suggestion for BPF solana-sdk legacy compile test

* Refactor node version check to handle commitment variants and transaction encoding

* Hide deprecated variants from cli help

* Add cli App comments
2021-01-26 19:23:07 +00:00
behzad nouri
e1021d9f83 removes redundant epoch stakes cache in retransmit (#14781)
Following d6d76219b, staked nodes computed from vote accounts are
already cached in runtime::Stakes, so the caching in retransmit_stage is
redundant.
2021-01-24 21:15:09 +00:00
Michael Vines
bf1943e489 Add solana-test-validator --warp-slot argument 2021-01-22 21:17:02 -08:00
behzad nouri
8e581601d6 patches crds vote-index assignment bug (#14438)
If tower is full, old votes are evicted from the front of the deque:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373
whereas recent votes if expire are evicted from the back:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537

As a result, from a single tower_index scalar, we cannot infer which crds-vote
should be overwritten:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576

In addition there is an off by one bug in the existing code. tower_index is
bounded by MAX_LOCKOUT_HISTORY - 1:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382
So, it is at most 30, whereas MAX_VOTES is 32:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29
Which means that this branch is never taken:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593
so crds table alwasys keeps 29 **oldest** votes by wallclock, and then
only overrides the 30st one each time. (i.e a tally of only two most
recent votes).
2021-01-21 13:08:07 +00:00
behzad nouri
b5fd0ed859 rewrites turbine retransmit peers computation (#14584) 2021-01-19 04:18:47 +00:00
carllin
6dfad0652f Cache account stores, flush from AccountsBackgroundService (#13140) 2021-01-11 17:00:23 -08:00
Michael Vines
a95675a7ce Avoid tmp snapshot backlog in SnapshotPackagerService under high load (#14516) 2021-01-11 10:21:15 -08:00
Michael Vines
7be6770808 Rename CompressionType to ArchiveFormat 2021-01-09 09:07:49 -08:00
carllin
5affd8aa72 Add secondary indexes (#14212) 2020-12-31 18:06:03 -08:00
behzad nouri
691031fefd limits number of crds values returned when responding to pull requests (#13739)
Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
2020-12-18 18:45:12 +00:00
Michael Vines
7143aaa89b Clippy 2020-12-14 08:03:29 -08:00
carllin
55fc963595 Move slot cleanup to AccountsBackgroundService (#13911)
* Move bank drop to AccountsBackgroundService

* Send to ABS on drop instead, protects against other places banks are dropped

* Fix Abi

* test

Co-authored-by: Carl Lin <carl@solana.com>
2020-12-13 01:22:34 +00:00
Michael Vines
bbad3fe501 TestValidator now implements Drop, no need to close() it 2020-12-11 04:17:38 +00:00
Michael Vines
0a9ff1dc9d Initial solana-test-validator command-line program 2020-12-11 04:17:38 +00:00
Michael Vines
73111b005f Reduce the number of snapshots 2020-12-01 11:13:37 -08:00
Michael Vines
43b82b31e5 More TestValidator cleanup 2020-11-26 08:56:25 +00:00
Michael Vines
b5f7e39be8 TestValidator public interface cleanup 2020-11-25 17:04:37 -08:00
behzad nouri
b58f69297f makes crds fields private (#13703)
Crds fields should maintain several invariants between themselves, so
exposing them as public fields can be bug prone. In addition these
invariants are asserted on every write:
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262
which adds extra instructions and is not optimal. Should these fields be
private the asserts will be redundant.
2020-11-19 20:57:40 +00:00
Tyera Eulberg
ef99689592 Improve TestValidator instantiation (#13627)
* Add TestValidator::new_with_fees constructor, and warning for low bootstrap_validator_lamports

* Add logging to solana-tokens integration test to help catch low bootstrap_validator_lamports in the future

* Reasonable TestValidator mint_lamports
2020-11-16 23:27:36 -07:00