Persistent tower (#10718)

* Save/restore Tower

* Avoid unwrap()

* Rebase cleanups

* Forcibly pass test

* Correct reconcilation of votes after validator resume

* d b g

* Add more tests

* fsync and fix test

* Add test

* Fix fmt

* Debug

* Fix tests...

* save

* Clarify error message and code cleaning around it

* Move most of code out of tower save hot codepath

* Proper comment for the lack of fsync on tower

* Clean up

* Clean up

* Simpler type alias

* Manage tower-restored ancestor slots without banks

* Add comment

* Extract long code blocks...

* Add comment

* Simplify returned tuple...

* Tweak too aggresive log

* Fix typo...

* Add test

* Update comment

* Improve test to require non-empty stray restored slots

* Measure tower save and dump all tower contents

* Log adjust and add threshold related assertions

* cleanup adjust

* Properly lower stray restored slots priority...

* Rust fmt

* Fix test....

* Clarify comments a bit and add TowerError::TooNew

* Further clean-up arround TowerError

* Truly create ancestors by excluding last vote slot

* Add comment for stray_restored_slots

* Add comment for stray_restored_slots

* Use BTreeSet

* Consider root_slot into post-replay adjustment

* Tweak logging

* Add test for stray_restored_ancestors

* Reorder some code

* Better names for unit tests

* Add frozen_abi to SavedTower

* Fold long lines

* Tweak stray ancestors and too old slot history

* Re-adjust error conditon of too old slot history

* Test normal ancestors is checked before stray ones

* Fix conflict, update tests, adjust behavior a bit

* Fix test

* Address review comments

* Last touch!

* Immediately after creating cleaning pr

* Revert stray slots

* Revert comment...

* Report error as metrics

* Revert not to panic! and ignore unfixable test...

* Normalize lockouts.root_slot more strictly

* Add comments for panic! and more assertions

* Proper initialize root without vote account

* Clarify code and comments based on review feedback

* Fix rebase

* Further simplify based on assured tower root

* Reorder code for more readability

Co-authored-by: Michael Vines <mvines@gmail.com>
This commit is contained in:
Ryo Onodera
2020-09-19 14:03:54 +09:00
committed by GitHub
parent 28f2c15597
commit cb8661bd49
15 changed files with 1712 additions and 106 deletions

View File

@ -793,6 +793,12 @@ pub fn main() {
.takes_value(false)
.help("Use CUDA"),
)
.arg(
clap::Arg::with_name("require_tower")
.long("require-tower")
.takes_value(false)
.help("Refuse to start if saved tower state is not found"),
)
.arg(
Arg::with_name("expected_genesis_hash")
.long("expected-genesis-hash")
@ -1015,6 +1021,7 @@ pub fn main() {
let restricted_repair_only_mode = matches.is_present("restricted_repair_only_mode");
let mut validator_config = ValidatorConfig {
require_tower: matches.is_present("require_tower"),
dev_halt_at_slot: value_t!(matches, "dev_halt_at_slot", Slot).ok(),
expected_genesis_hash: matches
.value_of("expected_genesis_hash")