Commit Graph

293 Commits

Author SHA1 Message Date
Brooks Prumo
79ade5ec68 Add test_incremental_snapshot_download() to local-cluster (#19746) 2021-09-13 21:44:48 -05:00
Jeff Washington (jwash)
b57e86abf2 cache account hash info (#19426)
* cache account hash info

* ledger_path -> accounts_hash_cache_path
2021-09-13 20:39:26 -05:00
Brooks Prumo
3f6eb96d6e Call Bank::update_accounts_hash() before taking bank snapshot (#19765)
This is a bugfix.

When `load_frozen_forks()` called `snapshot_bank()`, the accounts hash
was not computed.  So then when a snapshot archive was eventually
created, its hash would be all 1s, which is clearly wrong.  The fix is
to update the bank's accounts hash before taking the bank snapshot.
2021-09-13 08:27:50 -05:00
Brooks Prumo
7aa5f6b833 Add CLI args for incremental snapshots (#19694)
Add `--incremental-snapshots` flag to enable incremental snapshots.
This will allow setting `--full-snapshot-interval-slots` and
`--incremental-snapshot-interval-slots`.

Also added `--maximum-incremental-snapshots-to-retain`.

Co-authored-by: Michael Vines <mvines@gmail.com>
2021-09-10 15:59:26 -05:00
Jeff Washington (jwash)
456bf15012 AccountsIndexConfig -> AccountsDbConfig (#19687) 2021-09-08 04:30:38 +00:00
Brooks Prumo
fe8ba81ce6 Rename to is_valid instead of is_invalid (#19670) 2021-09-07 09:31:54 -05:00
Brooks Prumo
9d9482b9d8 Plumb maximum_incremental_snapshot_archives_to_retain (#19640) 2021-09-06 18:01:56 -05:00
Brooks Prumo
5e25ee5ebe Add maximum_incremental_snapshot_archives_to_retain to SnapshotConfig (#19612) 2021-09-03 20:21:32 +00:00
Brooks Prumo
fb1b853b14 Make wait_for_restart_window() aware of Incremental Snapshots (#19587)
When the validator is waiting to restart, the snapshot slot is checked.
With Incremental Snapshots, the full snapshot interval is going to be
_much_ higher, which would make this function wait for a looooong time.

So, if there's an incremental snapshot slot when checking, use that
slot, otherwise fall back to the full snapshot slot.
2021-09-03 11:42:22 -05:00
Brooks Prumo
7ab0aec61f Rename maximum_full_snapshot_archives_to_retain (#19610)
To prepare for adding maximum_incremental_snapshot_archives_to_retain,
rename the current field in SnapshotConfig.
2021-09-03 11:28:10 -05:00
Brooks Prumo
8ac94b2cf4 Add Incremental Snapshot support to RPC (#19559)
#### Problem

There's no way to get incremental snapshot information from RPC.

#### Summary of Changes

- Add new RPC method, `getHighestSnapshotSlot` that returns a `SnapshotSlotInfo`, which contains both the highest full snapshot slot, and the highest incremental snapshot slot _based on_ the full snapshot.
- Deprecate old RPC method, `getSnapshotSlot`
- Update API docs

Fixes #19579
2021-09-02 15:25:42 -05:00
Lijun Wang
8378e8790f Accountsdb replication installment 2 (#19325)
This is the 2nd installment for the AccountsDb replication.

Summary of Changes

The basic google protocol buffer protocol for replicating updated slots and accounts. tonic/tokio is used for transporting the messages.

The basic framework of the client and server for replicating slots and accounts -- the persisting of accounts in the replica-side will be done at the next PR -- right now -- the accounts are streamed to the replica-node and dumped. Replication for information about Bank is also not done in this PR -- to be addressed in the next PR to limit the change size.

Functionality used by both the client and server side are encapsulated in the replica-lib crate.

There is no impact to the existing validator by default.

Tests:

Observe the confirmed slots replicated to the replica-node.
Observe the accounts for the confirmed slot are received at the replica-node side.
2021-09-01 14:10:16 -07:00
Brooks Prumo
1d5a8ebc6a Revert "Add LastFullSnapshotSlot to SnapshotConfig (#19341)" (#19529)
This reverts commit 4d361af976.
2021-08-31 22:03:19 -05:00
Brooks Prumo
6d939811e9 Name snapshots consistently (#19346)
#### Problem

Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:

```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```

#### Summary of Changes

For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.


Co-authored-by: Michael Vines <mvines@gmail.com>
2021-08-21 15:41:03 -05:00
Brooks Prumo
4d361af976 Add LastFullSnapshotSlot to SnapshotConfig (#19341) 2021-08-20 17:06:53 +00:00
Trent Nelson
e0bc5fa690 validator: Trusted validators are now called known validators 2021-08-19 22:43:49 -06:00
Jeff Washington (jwash)
7c70f2158b accounts_index_bins to AccountsIndexConfig (#19257)
* accounts_index_bins to AccountsIndexConfig

* rename param bins -> config

* rename BINS_FOR* to ACCOUNTS_INDEX_CONFIG_FOR*
2021-08-17 14:50:01 -05:00
Michael Vines
b15fa9fbd2 Add EtcdTowerStorage 2021-08-14 09:46:36 -07:00
Jeff Washington (jwash)
e91988c977 cli for num account index bins (#19085) 2021-08-11 11:45:25 -05:00
Michael Vines
e9722474eb Move tower storage into its own module 2021-08-11 00:20:46 -07:00
Brooks Prumo
fd937548a0 Move SnapshotArchiveInfo and friends into its own module (#19114) 2021-08-08 07:57:06 -05:00
Brooks Prumo
00890957ee Add snapshot_utils::bank_from_latest_snapshot_archives() (#18983)
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots.  A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify.  Additionally, new tests have been
written and existing tests have been updated to use this new function.

Fixes #18973

While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled.  Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`.  And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats.  This bundling simplifies bank
rebuilding.
2021-08-06 20:16:06 -05:00
Michael Vines
397801a2d8 Extract tower storage details from Tower struct 2021-08-06 10:04:37 -07:00
Jeff Washington (jwash)
3280ae3e9f add validator option --accounts-db-skip-shrink (#19028)
* add validator option --accounts-db-skip-shrink

* typo
2021-08-04 17:28:33 -05:00
Brooks Prumo
ca14475085 Add incremental_snapshot_archive_interval_slots to SnapshotConfig (#19026)
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
2021-08-04 14:40:20 -05:00
Trent Nelson
71f6d839f9 validator: remove disused cuda config argument 2021-07-29 03:08:52 +00:00
Trent Nelson
8ed0cd0fff validator: check target CPU features earlier 2021-07-29 03:08:52 +00:00
Trent Nelson
ed8285c096 validator: start logging asap 2021-07-29 03:08:52 +00:00
behzad nouri
d2d5f36a3c adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
Brooks Prumo
d1debcd971 Add incremental snapshot utils (#18504)
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks.  This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.

Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.

Also of note, some renaming has happened:
  1. Snapshots are now either `full_` or `incremental_` throughout the
     codebase.  If not specified, the code applies to both.
  2. Bank snapshots now are called "bank snapshots"
     (before they were called "slot snapshots", "bank snapshots", or
      just "snapshots").  The one exception is within `Bank`, where they
     are still just "snapshots", because they are already "bank
     snapshots".
  3. Snapshot archives now have `_archive` in the code.  This
     should clear up an ambiguity between bank snapshots and snapshot
     archives.
2021-07-22 14:40:37 -05:00
Michael Vines
61865c0ee0 solana-validator set-identity now loads the tower file for the new identity 2021-07-21 22:22:08 -07:00
dependabot[bot]
6abafac479 chore: bump fd-lock from 2.0.0 to 3.0.0 (#18756)
* chore: bump fd-lock from 2.0.0 to 3.0.0

Bumps [fd-lock](https://github.com/yoshuawuyts/fd-lock) from 2.0.0 to 3.0.0.
- [Release notes](https://github.com/yoshuawuyts/fd-lock/releases)
- [Commits](https://github.com/yoshuawuyts/fd-lock/commits)

---
updated-dependencies:
- dependency-name: fd-lock
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Use new api

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-07-19 17:22:11 +00:00
Michael Vines
c418e8f370 wait-for-restart-window command now accepts an optional --identity argument 2021-07-14 23:01:14 -07:00
sakridge
7f2254225e Move entry/poh to own crate to speed up poh bench build (#18225) 2021-07-14 14:16:29 +02:00
Michael Vines
b6792a3328 Add ability to change the validator identity at runtime 2021-07-01 17:50:04 -07:00
Michael Vines
bf157506e8 Remove id ref 2021-07-01 17:50:04 -07:00
Brooks Prumo
45d54b1fc6 Add SnapshotArchiveInfo and refactor functions in snapshot_utils (#18232) 2021-07-01 12:20:56 -05:00
Brooks Prumo
89a3e4f91e Move SnapshotConfig into its own module (#18331)
Also move ArchiveFormat to snapshot_utils, and do not
reexport SnapshotVersion.
2021-07-01 08:55:26 -05:00
Michael Vines
314102cb54 Remove redundant JsonRpcConfig::identity_pubkey field 2021-06-22 17:20:11 -07:00
behzad nouri
58e115275a obtains shred-version from entrypoint's ip-echo-server in validator-main 2021-06-21 19:37:16 +00:00
behzad nouri
598093b5db adds shred-version to ip-echo-server response
When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417

Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.

In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.
2021-06-21 19:37:16 +00:00
Alexander Meißner
6514096a67 chore: cargo +nightly clippy --fix -Z unstable-options 2021-06-18 10:42:46 -07:00
Trent Nelson
5efc48fc69 validator: expose max active pubsub subscriptions to CLI 2021-06-17 06:32:52 +00:00
Lijun Wang
269d995832 Make account shrink configurable #17544 (#17778)
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true

Fixes #17544
2021-06-09 21:21:32 -07:00
Tyera Eulberg
544b3c0d17 Create solana-poh and move remaining rpc modules to solana-rpc (#17698)
* Create solana-poh crate

* Move BigTableUploadService to solana-ledger

* Add solana-rpc to workspace

* Move dependencies to solana-rpc

* Move remaining rpc modules to solana-rpc

* Single use statement solana-poh

* Single use statement solana-rpc
2021-06-04 09:23:06 -06:00
dependabot[bot]
a43c29e858 Bump indicatif from 0.15.0 to 0.16.2 (#17628)
* Bump indicatif from 0.15.0 to 0.16.2

Bumps [indicatif](https://github.com/mitsuhiko/indicatif) from 0.15.0 to 0.16.2.
- [Release notes](https://github.com/mitsuhiko/indicatif/releases)
- [Commits](https://github.com/mitsuhiko/indicatif/compare/0.15.0...0.16.2)

Signed-off-by: dependabot[bot] <support@github.com>

* [auto-commit] Update all Cargo lock files

* Fix message types

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <dependabot-buildkite@noreply.solana.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-06-01 17:38:02 +00:00
Ryo Onodera
1f97b2365f Avoid full-range compactions with periodic filtered b.g. ones (#16697)
* Update rocksdb to v0.16.0

* Promote the infrequent and important log to info!

* Force background compaction by ttl without manual compaction

* Fix test

* Support no compaction mode in test_ledger_cleanup_compaction

* Fix comment

* Make compaction_interval customizable

* Avoid major compaction with periodic filtering...

* Adress lazy_static, special cfs and range check

* Clean up a bit and add comment

* Add comment

* More comments...

* Config code cleanup

* Add comment

* Use .conflicts_with()

* Nullify unneeded delete_range ops for special CFs

* Some clean ups

* Clarify the locking intention

* Ensure special CFs' consistency with PurgeType::CompactionFilter

* Fix comment

* Fix bad copy paste

* Fix various types...

* Don't use tuples

* Add a unit test for compaction_filter

* Fix typo...

* Remove flag and just use new behavior always

* Fix wrong condition negation...

* Doc. about no set_last_purged_slot in purge_slots

* Write a test and fix off-by-one bug....

* Apply suggestions from code review

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Follow up to github review suggestions

* Fix line-wrapping

* Fix conflict

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-05-28 16:42:56 +09:00
Lijun Wang
54f0fc9f0f Use type alias for DownloadProgress callback (#17518)
Convert to use type alias for the callback and cascade the changes to callers. Thanks @jeffwashington for the help making it possible.
Changed the closure for the progress update in the validator main to FnMut and modify the abort count in the closure which is more reliable.
2021-05-26 13:26:07 -07:00
Tyera Eulberg
9a5330b7eb Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00
Lijun Wang
4c17243157 snapshot download enhancement (#17415)
1. Allow the validator bootstrap code to specify the minimal snapshot download speed. If the snapshot download speed is detected below that, a different RPC can be retried. The default is 10MB/sec.

2. To prevent spinning on a number of sub-optimal choices and not making progress, the abort/retry logic is implemented with the following safe guards:
2.1 at maximum we do this retry for 5 times -- this number is configurable with default 5.
2.2 if the download in one notification round (5 second) is more than 2%, do not do retry -- it is not as bad anyway.
2.3 if the remaining estimate time is less than 1 minutes, do not abort retry as it will be done quickly anyway.
2.4 We do this abort/retry logic only at the first notification to avoid wasting download efforts -- the reasoning is being opportunistic and too greedy may not achieve overall shorter download time.

3. The download_snapshot and download_file is modified with the option allowing caller to notified of download progress via a callback. This allows the business logic of retrying to the place it belongs.
2021-05-25 09:32:12 -07:00