Commit Graph

2948 Commits

Author SHA1 Message Date
97f2eb8e65 Banking stage: Deserialize packets only once
Benchmarks show roughly a 6% improvement. The impact could be more
significant when transactions need to be retried a lot.

after patch:
{'name': 'banking_bench_total', 'median': '72767.43'}
{'name': 'banking_bench_tx_total', 'median': '80240.38'}
{'name': 'banking_bench_success_tx_total', 'median': '72767.43'}
test bench_banking_stage_multi_accounts
... bench:   6,137,264 ns/iter (+/- 1,364,111)
test bench_banking_stage_multi_programs
... bench:  10,086,435 ns/iter (+/- 2,921,440)

before patch:
{'name': 'banking_bench_total', 'median': '68572.26'}
{'name': 'banking_bench_tx_total', 'median': '75704.75'}
{'name': 'banking_bench_success_tx_total', 'median': '68572.26'}
test bench_banking_stage_multi_accounts
... bench:   6,521,007 ns/iter (+/- 1,926,741)
test bench_banking_stage_multi_programs
... bench:  10,526,433 ns/iter (+/- 2,736,530)
2022-04-15 00:57:11 -06:00
7a4a6597c0 Don't enforce ulimit for validator test config (#24272) 2022-04-12 22:06:37 +02:00
9b8850f99e test-validator: Add --max-compute-units flag (#24130)
* test-validator: Add `--max-compute-units` flag

* Add `RuntimeConfig` for tweaking runtime behavior

* Actually add the file

* Move RuntimeConfig to runtime
2022-04-12 02:28:10 +02:00
c1687b0604 Switch to await-aware tokio::sync::Mutex 2022-04-11 18:15:03 -04:00
60b2155bd3 Add accounts-filler-size command line option (#23896) 2022-04-11 13:10:09 -05:00
ff3b6d2b8b Remove duplicate increment (#24219) 2022-04-09 15:21:39 -05:00
a058f348a2 Address review comments 2022-04-08 14:37:55 -05:00
2ed29771f2 Unittest for cost tracker after process_and_record_transactions 2022-04-08 14:37:55 -05:00
924b8ea1eb Adjustments to cost_tracker updates
- don't store pending tx signatures and costs in CostTracker
- apply tx costs to global state immediately again
- go from commit_or_cancel to update_or_remove, where the cost tracker
  is either updated with the true costs for successful tx, or the costs
  of a retryable tx is removed
- move the function into qos_service and hold the cost tracker lock for
  the whole loop
2022-04-08 14:37:55 -05:00
9e07272af8 - Only commit successfully executed transactions' cost to cost_tracker;
- In-fly transactions are pended in cost_tracker until being committed
  or cancelled;
2022-04-08 14:37:55 -05:00
d2702201ca Bump tonic, tonic-build, prost, and etcd-client (#24147)
* Bump tonic, prost, and etcd-client

* Restore doc ignores
2022-04-08 10:21:45 -06:00
210f6a6fab move hash calculation out of acct bg svc (#23689)
* move hash calculation out of acct bg svc

* pr feedback
2022-04-08 10:42:03 -05:00
1dd63631c0 Add high level overview comments on ledger_cleanup_service (#24184) 2022-04-08 00:49:21 -05:00
e105547c14 tvu and tpu timeout on joining its microservices (#24111)
* panic when test timeout

* nonblocking send when when droping banks

* debug log

* timeout for tvu

* unused varaible

* timeout for tpu

* Revert "debug log"

This reverts commit da780a3301.

* add timeout const

* fix typo

* Revert "nonblocking send when when droping banks".
I will create another pull request for this.

This reverts commit 088c98ec0f.

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-07 20:20:13 -05:00
c27150b1a3 reserialize_bank_fields_with_hash (#23916)
* reserialize_bank_with_new_accounts_hash

* Update runtime/src/serde_snapshot.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/serde_snapshot/tests.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/serde_snapshot/tests.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* pr feedback

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-04-07 14:05:57 -05:00
550ca7bf92 compare contents of serialized banks instead of exact file format (#24141)
* compare contents of serialized banks instead of exact file format

* Update runtime/src/snapshot_utils.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/snapshot_utils.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* pr feedback

* get rid of clone

* pr feedback

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-04-06 21:55:44 -05:00
fddd162645 reserialize bank in ahv by first writing to temp file in abs (#23947) 2022-04-06 21:39:26 -05:00
fb67ff14de Remove replica-node crates (#24152) 2022-04-06 16:52:19 -06:00
afeb1d3cca Bump lru crate (#24150) 2022-04-06 16:18:42 -06:00
c322842257 Replace channel with Mutex<Option> for AccountsPackage (#24013) 2022-04-06 05:47:19 -05:00
302142bb25 fix typo (#24123) 2022-04-05 15:55:47 -05:00
db23295e1c removes legacy weighted_shuffle and weighted_best methods (#24125)
Older weighted_shuffle is based on a heuristic which results in biased
samples as shown in:
https://github.com/solana-labs/solana/pull/18343
and can be replaced with WeightedShuffle.

Also, as described in:
https://github.com/solana-labs/solana/pull/13919
weighted_best can be replaced with rand::distributions::WeightedIndex,
or WeightdShuffle::first.
2022-04-05 19:19:22 +00:00
4ea59d8cb4 Set drop callback on first root bank (#23999) 2022-04-05 13:02:33 -05:00
2282571493 removes outdated and flaky test_skip_repair from retransmit-stage (#24121)
test_skip_repair in retransmit-stage is no longer relevant because
following: https://github.com/solana-labs/solana/pull/19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
https://github.com/solana-labs/solana/pull/24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
2022-04-05 16:02:53 +00:00
2b718d00b0 removes legacy compatibility turbine peers shuffle code 2022-04-05 12:04:12 +00:00
d0b850cdd9 removes turbine peers shuffle patch feature 2022-04-05 12:04:12 +00:00
855801cc95 removes deterministic-shred-seed feature 2022-04-05 12:04:12 +00:00
ee6bb0d5d3 track fec set turbine stats (#23989) 2022-04-04 14:44:21 -07:00
6ba4e870c4 Blockstore should drop signals before validator exit (#24025)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* let compiler infer mutex type

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-04 11:38:05 -05:00
7cb3b6cbe2 demotes WeightedShuffle failures to error metrics (#24079)
Since call-sites are calling unwrap anyways, panicking seems too punitive
for our use cases.
2022-04-03 16:20:06 +00:00
ffa4cafe1c Revert sequential execution of validator_exit and validator_parallel_exit tests (#24048)
* handle channel disconnect

* revert sequential execution of validator_exit and parallel_validator_exit tests
2022-04-02 10:22:47 -05:00
0b5ed87220 (LedgerStore) Enable performance sampling in column family get() (#23834)
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations.  The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.

The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
2022-04-01 13:13:32 -07:00
df4d92f9cf Revert voting service to use UDP instead of QUIC (#24032) 2022-04-01 09:34:18 -07:00
51b37f0184 Modify rpc_completed_slot_service to be non-blocking (#24007)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-03-31 16:44:23 -05:00
9c8dad33c7 add epoch_schedule and rent_collector to hash calc (#24012) 2022-03-31 10:51:18 -05:00
da001d54e5 calculate_accounts_hash_helper uses config (#24003) 2022-03-31 09:29:45 -05:00
125f9634fd add hash calc config.use_write_cache (#24005) 2022-03-30 17:19:34 -05:00
ba770832d0 Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
c24de17278 remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
01af40d6b6 Fix intermittent validator_exit test failure (#23594)
* run validator_exit_test sequentially

* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests

* fix intermittent validator exit failure

* no sleep

* undo the code move
2022-03-25 14:38:19 -05:00
6b85c2104c Implement forwarding via TpuConnection (#23817) 2022-03-25 11:31:40 -04:00
f44c8f296f fix: thread enforce_ulimit_nofile config down when opening blockstore (#23925) 2022-03-25 03:13:33 -05:00
51f5524e2f make verify_accounts_package_hash like other hash calc (#23906) 2022-03-24 17:49:48 -05:00
55d61023f7 document 'accounts' hash (#23907) 2022-03-24 15:58:52 -05:00
fedf4e984f typo (#23910) 2022-03-24 15:21:59 -05:00
37c36ce3fa pass stats separately from CalcAccountsHashConfig (#23892) 2022-03-24 12:48:47 -05:00
c31db81ac4 Use VoteAccountsHashMap type alias in all applicable spots (#23904) 2022-03-24 12:09:48 -05:00
82945ba973 Optimize TpuConnection and its implementations and refactor connection-cache to not use dyn in order to enable those changes (#23877) 2022-03-24 11:40:26 -04:00
5b916961b5 HashCalc uses self.accounts_cache (#23890) 2022-03-24 10:34:28 -05:00
b22165ad69 hash calc uses self.filler_account_suffix (#23887) 2022-03-24 09:58:06 -05:00