Commit Graph

695 Commits

Author SHA1 Message Date
1f136de294 (LedgerStore) Report perf metrics for RocksDB deletes (#24138)
#### Summary of Changes
This PR enables perf metrics reporting for RocksDB deletes.
Samples are reported under "blockstore_rocksdb_write_perf" with op=delete
The sampling rate is still controlled by env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
and its default to 10 (meaning we report 10 in 1000 perf samples).
2022-04-08 00:18:05 -07:00
b84521d47d (LedgerStore) Report perf metrics for RocksDB write batch (#24061)
#### Summary of Changes
This PR enables perf metrics reporting for RocksDB write-batches.
Samples are reported under "blockstore_rocksdb_write_perf" with op=write_batch

Its cf_name tag is set to "write_batch" as well as each write-batch could include multiple column families.

The sampling rate is still controlled by env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
and its default to 10 (meaning we report 10 in 1000 perf samples).
2022-04-08 00:17:51 -07:00
206c3dd402 (LedgerStore) Enable RocksDB Perf metrics reporting for get_bytes and put_bytes (#24066)
#### Summary of Changes
Enable RocksDB Perf metrics reporting for get_bytes and put_bytes.
2022-04-07 00:24:10 -07:00
4f0e887702 (LedgerStore) Report RocksDB perf metrics for Protobuf Columns (#24065)
This PR enables the reporting of both RocksDB read and write perf metrics for ProtobufColumns,
including TransactionStatus and Rewards.
2022-04-07 00:15:00 -07:00
2d1f27ed8e (LedgerStore) Perf Metric for RocksDB Writes (#23951)
#### Summary of Changes
This PR implements the reporting of RocksDB write perf metrics to blockstore_rocksdb_write_perf
based on RocksDB's PerfContext.  The default sample rate is 10 in 1000, and the env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K can control the sample rate.
2022-04-06 12:12:38 -07:00
c322842257 Replace channel with Mutex<Option> for AccountsPackage (#24013) 2022-04-06 05:47:19 -05:00
24cc6c33de (LedgerStore)(Refactor) Move metric reporting functions to a dedicate mod (#24060)
Previously, the metric reporting functions are implemented under LedgerColumnMetric.
However, there're operations like write batch which is issued by the function inside Rocks.

This PR moves reporting functions to its own dedicate mod so that both LedgerColumn and
Rocks can report column perf metrics.
2022-04-05 15:06:17 -07:00
4ea59d8cb4 Set drop callback on first root bank (#23999) 2022-04-05 13:02:33 -05:00
eb65ffb779 Optimize replay waking up (#24051)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* increase replay wake up channel bounds

* reduce replay wakeup signal bound to 1

* reduce log level

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-05 08:57:12 -05:00
855801cc95 removes deterministic-shred-seed feature 2022-04-05 12:04:12 +00:00
ee6bb0d5d3 track fec set turbine stats (#23989) 2022-04-04 14:44:21 -07:00
6ba4e870c4 Blockstore should drop signals before validator exit (#24025)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* let compiler infer mutex type

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-04 11:38:05 -05:00
0b5ed87220 (LedgerStore) Enable performance sampling in column family get() (#23834)
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations.  The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.

The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
2022-04-01 13:13:32 -07:00
ba770832d0 Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
cda3d66b21 uses first_coding_index for erasure meta obtained from coding shreds (#23974)
Now that nodes correctly populate position field in coding shreds, and
first_coding_index in erasure meta, the old code to maintain backward
compatibility can be removed.
The commit is working towards changing erasure coding schema to 32:64.
2022-03-30 13:55:11 +00:00
c24de17278 remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
1f9c89c1e8 expands lifetime of SlotStats (#23872)
Current slot stats are removed when the slot is full or every 30 seconds
if the slot is before root:
https://github.com/solana-labs/solana/blob/493a8e234/ledger/src/blockstore.rs#L2017-L2027

In order to track if the slot is ultimately marked as dead or rooted and
emit more metrics, this commit expands lifetime of SlotStats while
bounding total size of cache using an LRU eviction policy.
2022-03-25 19:32:22 +00:00
c31db81ac4 Use VoteAccountsHashMap type alias in all applicable spots (#23904) 2022-03-24 12:09:48 -05:00
c83c95b56b (LedgerStore) Create ColumnMetrics trait for CF metric reporting (#23763)
This PR does a refactoring on column family-related metrics reporting.
As the metric reporting is per column family basis, the PR creates
ColumnMetrics trait and move the metric reporting logic into it.

This refactoring will make future column metric reporting (such as
read PerfContext) much cleaner.
2022-03-23 20:51:49 -07:00
7af48465fa transaction-status: Add return data to meta (#23688)
* transaction-status: Add return data to meta

* Add return data to simulation results

* Use pretty-hex for printing return data

* Update arg name, make TransactionRecord struct

* Rename TransactionRecord -> ExecutionRecord
2022-03-22 23:17:05 +01:00
ae75b1a25f (LedgerStore) Add compression type (#23578)
This PR adds `--rocksdb-ledger-compression` as a hidden argument to the validator
for specifying the compression algorithm for TransactionStatus.  Available compression
algorithms include `lz4`, `snappy`, `zlib`. The default value is `none`.

Experimental results show that with lz4 compression, we can achieve ~37% size-reduction
on the TransactionStatus column family, or ~8% size-reduction of the ledger store size.
2022-03-22 02:27:09 -07:00
f999eef452 (LedgerStore) Rename BlockstoreAdvancedOptions to LedgerColumnOptions (#23764)
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
2022-03-18 11:13:35 -07:00
6b0d34d70d removes redundant Arcs from Blockstore (#23735) 2022-03-17 19:43:57 +00:00
86c695268e (LedgerStore) Improve the function API of new_cf_descriptor (#23696)
As we start adding more options into BlockstoreOptions, it's better to allow
new_cf_descriptor to take the reference to BlockstoreOptions so that
we can avoid future function API changes on new_cf_descriptor.
2022-03-16 11:47:49 -07:00
390c5667f7 Setup bank hard_forks in load_bank_forks() 2022-03-15 23:08:07 -07:00
e7cf84a593 Remove AccessType arg from create_new_ledger (#23591)
Creating a new ledger implicitly means that no other process could have
previously held access to it. Additionally, creating a new ledger
implicitly requires writing, so it follows that Primary access is
required and we can drop access type as an argument.
2022-03-15 00:07:16 -05:00
390dc24608 Create leader schedule before processing blockstore 2022-03-14 15:29:58 -07:00
115f376465 Factor out bank_forks_utils::load_bank_forks() 2022-03-14 15:29:58 -07:00
63324be5b3 Remove last_full_snapshot_slot return value when it can be derived by the caller 2022-03-14 15:29:58 -07:00
cf5048faaa Eliminate to_loadresult() 2022-03-14 15:29:58 -07:00
3d4bf1d00a Refactor bank_forks_utils::load() to invoke a common process_blockstore_from_root() 2022-03-14 15:29:58 -07:00
c2ce152be8 Inline do_process_blockstore_from_root 2022-03-14 15:29:58 -07:00
07d5ee062d Push recyclers down the stack 2022-03-14 15:29:58 -07:00
fc2d1a61f3 signing_coding_time is not used (#23655) 2022-03-14 17:16:30 -05:00
1e20bd8f9a (LedgerStore) Include storage type as a tag in RocksDB metric reporting (#23523)
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.

To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type. 
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.

Dependency: #23580
2022-03-11 15:17:34 -08:00
99f1a22b9d (LedgerStore) Generalize RocksDB metric reporting macro (#23580)
#### Summary of Changes

This PR generalizes RocksDB metric reporting macro so that future RocksDB metrics can
reuse the same macro.
2022-03-10 23:13:59 -08:00
83f5f8bfc3 Move test_purge_huge test (#23587)
* ignore test_purge_huge tests it is expensive.

* move test_purge to integration tests
2022-03-10 15:31:43 -06:00
3c6840050c Ensure blocks do not exceed the max accounts data size during Replay Stage (#23422) 2022-03-10 10:24:31 -06:00
58c0db9704 Cleanup several blockstore functions (#23390)
* Rename excludes_from_compaction to should_exclude_from_compaction
* Make subfunction to create all cf descriptors
* Condense logic for when to disable compactions
2022-03-10 02:08:38 -06:00
5599bd9442 Remove BankFromArchiveTimings from ledger/ 2022-03-08 08:11:50 -08:00
9cfa21f7d1 Inline load_from_genesis 2022-03-08 08:11:50 -08:00
b8b7163b66 (Ledger Store) Report RocksDB Column Family Metrics (#22503)
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
2022-03-05 16:13:03 -08:00
36ad59673c drop mut 2022-03-04 09:52:46 +01:00
0d33b54d74 Rework do_process_blockstore_from_root to use BankForks 2022-03-04 09:52:46 +01:00
93c8e04d51 Simplify do_process_blockstore_from_root slightly 2022-03-04 09:52:46 +01:00
62d2a4cd88 Make ShredStorageType::RocksLevel public (#23272)
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.

        --shred-storage <SHRED_STORAGE>
            EXPERIMENTAL: Controls how RocksDB compacts shreds.  *WARNING*: You will lose your ledger data
            when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
            compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
            disk-write-bytes of the ledger store. [default: level]  [possible values: level, fifo]

        --shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
            The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
            268435456000]
2022-03-03 12:43:58 -08:00
634f4eb37d (LedgerStore) Use different path for different blockstore storage type. (#23236)
#### Summary of Changes
To avoid mixing the use of different shred storage types, each shred storage type
will have its blockstore in a different directory.

This PR still keeps the RocksFifo setting hidden.  The default ShredStorageType and
blockstore directory are still RocksLevel and `rocksdb`.

Will follow-up with PRs on making FIFO option public in ledger-tool and validator.

#### Test Plan
* Added a new test to verify the existence of `rocksdb-fifo` directory when FIFO compaction is used.
* Updated existing test to verify the current setting still store ledger under `rocksdb` directory.
* Manually ran ledger_cleanup_test with both level and fifo compaction and verified the resulting ledger.
* Ran a validator with this PR.
2022-03-02 18:30:22 -08:00
ef8b7d9c62 ledger tool halt at slot verify hash (#23424) 2022-03-02 11:11:18 -06:00
3b5b71ce44 Improve UX querying rpc for blocks at or before the snapshot from boot (#23403)
* Bump first-available block to first complete block

* Remove obsolete purges in tests (PrimaryIndex toggling no longer in use

* Check first-available block in Rpc check_slot_cleaned_up
2022-02-28 23:57:41 -07:00
39c86c6c44 fmt (#23329) 2022-02-24 12:12:29 -06:00