Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
2. CostTracker which is shared between threads, tracks transaction costs for each block.
* replace hard coded program ID with id() calls
* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.
* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.
* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes
* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;
* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
* replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;
* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;
* changed cost unit to microsecond, using value collected from mainnet;
* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
* investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table
* Change mutex on cost_tracker to RwLock
* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.
* acquire and hold locks for block of TXs, instead of acquire and release per transaction;
* remove redundant would_fit check from cost_tracker update execution path
* refactor cost checking with less frequent lock acquiring
* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.
* create hashmap with new_capacity to reduce runtime heap realloc.
* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default
* address potential deadlock by acquiring locks one at time
* Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
* log warning when channel send fails (#18391)
* Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions
* review fixes
* update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* manually fix merge conflicts
* Per-program id timings (#17554)
* more manual fixing
* solve a merge conflict
* featurize cost model
* more merge fix
* cost model uses compute_unit to replace microsecond as cost unit
(#18934)
* Reject blocks for costs above the max block cost (#18994)
* Update block max cost limit to fix performance regession (#19276)
* replace function with const var for better readability (#19285)
* Add few more metrics data points (#19624)
* periodically report sigverify_stage stats (#19674)
* manual merge
* cost model nits (#18528)
* Accumulate consumed units (#18714)
* tx wide compute budget (#18631)
* more manual merge
* ignore zerorize drop security
* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984
* add transaction cost histogram metrics (#20350)
* rebase to 1.7.15
* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes
* remove cost_model feature_set
* ignore vote transactions from cost model
Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
|
|
|
//! this service receives instruction ExecuteTimings from replay_stage,
|
|
|
|
//! update cost_model which is shared with banking_stage to optimize
|
|
|
|
//! packing transactions into block; it also triggers persisting cost
|
|
|
|
//! table to blockstore.
|
|
|
|
|
2021-12-13 08:48:39 -07:00
|
|
|
use solana_ledger::blockstore::Blockstore;
|
|
|
|
use solana_measure::measure::Measure;
|
|
|
|
use solana_runtime::{bank::Bank, bank::ExecuteTimings, cost_model::CostModel};
|
|
|
|
use solana_sdk::timing::timestamp;
|
|
|
|
use std::{
|
|
|
|
sync::{
|
|
|
|
atomic::{AtomicBool, Ordering},
|
|
|
|
mpsc::Receiver,
|
|
|
|
Arc, RwLock,
|
Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
2. CostTracker which is shared between threads, tracks transaction costs for each block.
* replace hard coded program ID with id() calls
* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.
* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.
* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes
* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;
* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
* replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;
* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;
* changed cost unit to microsecond, using value collected from mainnet;
* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
* investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table
* Change mutex on cost_tracker to RwLock
* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.
* acquire and hold locks for block of TXs, instead of acquire and release per transaction;
* remove redundant would_fit check from cost_tracker update execution path
* refactor cost checking with less frequent lock acquiring
* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.
* create hashmap with new_capacity to reduce runtime heap realloc.
* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default
* address potential deadlock by acquiring locks one at time
* Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
* log warning when channel send fails (#18391)
* Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions
* review fixes
* update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* manually fix merge conflicts
* Per-program id timings (#17554)
* more manual fixing
* solve a merge conflict
* featurize cost model
* more merge fix
* cost model uses compute_unit to replace microsecond as cost unit
(#18934)
* Reject blocks for costs above the max block cost (#18994)
* Update block max cost limit to fix performance regession (#19276)
* replace function with const var for better readability (#19285)
* Add few more metrics data points (#19624)
* periodically report sigverify_stage stats (#19674)
* manual merge
* cost model nits (#18528)
* Accumulate consumed units (#18714)
* tx wide compute budget (#18631)
* more manual merge
* ignore zerorize drop security
* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984
* add transaction cost histogram metrics (#20350)
* rebase to 1.7.15
* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes
* remove cost_model feature_set
* ignore vote transactions from cost model
Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
|
|
|
},
|
2021-12-13 08:48:39 -07:00
|
|
|
thread::{self, Builder, JoinHandle},
|
|
|
|
time::Duration,
|
Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
2. CostTracker which is shared between threads, tracks transaction costs for each block.
* replace hard coded program ID with id() calls
* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.
* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.
* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes
* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;
* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
* replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;
* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;
* changed cost unit to microsecond, using value collected from mainnet;
* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
* investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table
* Change mutex on cost_tracker to RwLock
* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.
* acquire and hold locks for block of TXs, instead of acquire and release per transaction;
* remove redundant would_fit check from cost_tracker update execution path
* refactor cost checking with less frequent lock acquiring
* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.
* create hashmap with new_capacity to reduce runtime heap realloc.
* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default
* address potential deadlock by acquiring locks one at time
* Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
* log warning when channel send fails (#18391)
* Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions
* review fixes
* update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* manually fix merge conflicts
* Per-program id timings (#17554)
* more manual fixing
* solve a merge conflict
* featurize cost model
* more merge fix
* cost model uses compute_unit to replace microsecond as cost unit
(#18934)
* Reject blocks for costs above the max block cost (#18994)
* Update block max cost limit to fix performance regession (#19276)
* replace function with const var for better readability (#19285)
* Add few more metrics data points (#19624)
* periodically report sigverify_stage stats (#19674)
* manual merge
* cost model nits (#18528)
* Accumulate consumed units (#18714)
* tx wide compute budget (#18631)
* more manual merge
* ignore zerorize drop security
* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984
* add transaction cost histogram metrics (#20350)
* rebase to 1.7.15
* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes
* remove cost_model feature_set
* ignore vote transactions from cost model
Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
#[derive(Default)]
|
|
|
|
pub struct CostUpdateServiceTiming {
|
|
|
|
last_print: u64,
|
|
|
|
update_cost_model_count: u64,
|
|
|
|
update_cost_model_elapsed: u64,
|
|
|
|
persist_cost_table_elapsed: u64,
|
|
|
|
}
|
|
|
|
|
|
|
|
impl CostUpdateServiceTiming {
|
|
|
|
fn update(
|
|
|
|
&mut self,
|
|
|
|
update_cost_model_count: u64,
|
|
|
|
update_cost_model_elapsed: u64,
|
|
|
|
persist_cost_table_elapsed: u64,
|
|
|
|
) {
|
|
|
|
self.update_cost_model_count += update_cost_model_count;
|
|
|
|
self.update_cost_model_elapsed += update_cost_model_elapsed;
|
|
|
|
self.persist_cost_table_elapsed += persist_cost_table_elapsed;
|
|
|
|
|
|
|
|
let now = timestamp();
|
|
|
|
let elapsed_ms = now - self.last_print;
|
|
|
|
if elapsed_ms > 1000 {
|
|
|
|
datapoint_info!(
|
|
|
|
"cost-update-service-stats",
|
|
|
|
("total_elapsed_us", elapsed_ms * 1000, i64),
|
|
|
|
(
|
|
|
|
"update_cost_model_count",
|
|
|
|
self.update_cost_model_count as i64,
|
|
|
|
i64
|
|
|
|
),
|
|
|
|
(
|
|
|
|
"update_cost_model_elapsed",
|
|
|
|
self.update_cost_model_elapsed as i64,
|
|
|
|
i64
|
|
|
|
),
|
|
|
|
(
|
|
|
|
"persist_cost_table_elapsed",
|
|
|
|
self.persist_cost_table_elapsed as i64,
|
|
|
|
i64
|
|
|
|
),
|
|
|
|
);
|
|
|
|
|
|
|
|
*self = CostUpdateServiceTiming::default();
|
|
|
|
self.last_print = now;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-10-27 16:48:20 +00:00
|
|
|
pub enum CostUpdate {
|
|
|
|
FrozenBank { bank: Arc<Bank> },
|
|
|
|
ExecuteTiming { execute_timings: ExecuteTimings },
|
|
|
|
}
|
|
|
|
|
|
|
|
pub type CostUpdateReceiver = Receiver<CostUpdate>;
|
Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
2. CostTracker which is shared between threads, tracks transaction costs for each block.
* replace hard coded program ID with id() calls
* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.
* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.
* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes
* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;
* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
* replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;
* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;
* changed cost unit to microsecond, using value collected from mainnet;
* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
* investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table
* Change mutex on cost_tracker to RwLock
* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.
* acquire and hold locks for block of TXs, instead of acquire and release per transaction;
* remove redundant would_fit check from cost_tracker update execution path
* refactor cost checking with less frequent lock acquiring
* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.
* create hashmap with new_capacity to reduce runtime heap realloc.
* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default
* address potential deadlock by acquiring locks one at time
* Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
* log warning when channel send fails (#18391)
* Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions
* review fixes
* update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* manually fix merge conflicts
* Per-program id timings (#17554)
* more manual fixing
* solve a merge conflict
* featurize cost model
* more merge fix
* cost model uses compute_unit to replace microsecond as cost unit
(#18934)
* Reject blocks for costs above the max block cost (#18994)
* Update block max cost limit to fix performance regession (#19276)
* replace function with const var for better readability (#19285)
* Add few more metrics data points (#19624)
* periodically report sigverify_stage stats (#19674)
* manual merge
* cost model nits (#18528)
* Accumulate consumed units (#18714)
* tx wide compute budget (#18631)
* more manual merge
* ignore zerorize drop security
* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984
* add transaction cost histogram metrics (#20350)
* rebase to 1.7.15
* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes
* remove cost_model feature_set
* ignore vote transactions from cost model
Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
|
|
|
|
|
|
|
pub struct CostUpdateService {
|
|
|
|
thread_hdl: JoinHandle<()>,
|
|
|
|
}
|
|
|
|
|
|
|
|
impl CostUpdateService {
|
|
|
|
#[allow(clippy::new_ret_no_self)]
|
|
|
|
pub fn new(
|
|
|
|
exit: Arc<AtomicBool>,
|
|
|
|
blockstore: Arc<Blockstore>,
|
|
|
|
cost_model: Arc<RwLock<CostModel>>,
|
|
|
|
cost_update_receiver: CostUpdateReceiver,
|
|
|
|
) -> Self {
|
|
|
|
let thread_hdl = Builder::new()
|
|
|
|
.name("solana-cost-update-service".to_string())
|
|
|
|
.spawn(move || {
|
|
|
|
Self::service_loop(exit, blockstore, cost_model, cost_update_receiver);
|
|
|
|
})
|
|
|
|
.unwrap();
|
|
|
|
|
|
|
|
Self { thread_hdl }
|
|
|
|
}
|
|
|
|
|
|
|
|
pub fn join(self) -> thread::Result<()> {
|
|
|
|
self.thread_hdl.join()
|
|
|
|
}
|
|
|
|
|
|
|
|
fn service_loop(
|
|
|
|
exit: Arc<AtomicBool>,
|
|
|
|
blockstore: Arc<Blockstore>,
|
|
|
|
cost_model: Arc<RwLock<CostModel>>,
|
|
|
|
cost_update_receiver: CostUpdateReceiver,
|
|
|
|
) {
|
|
|
|
let mut cost_update_service_timing = CostUpdateServiceTiming::default();
|
|
|
|
let mut dirty: bool;
|
|
|
|
let mut update_count: u64;
|
|
|
|
let wait_timer = Duration::from_millis(100);
|
|
|
|
|
|
|
|
loop {
|
|
|
|
if exit.load(Ordering::Relaxed) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
dirty = false;
|
|
|
|
update_count = 0_u64;
|
|
|
|
let mut update_cost_model_time = Measure::start("update_cost_model_time");
|
|
|
|
for cost_update in cost_update_receiver.try_iter() {
|
2021-10-27 16:48:20 +00:00
|
|
|
match cost_update {
|
|
|
|
CostUpdate::FrozenBank { bank } => {
|
|
|
|
bank.read_cost_tracker().unwrap().report_stats(bank.slot());
|
|
|
|
}
|
|
|
|
CostUpdate::ExecuteTiming { execute_timings } => {
|
|
|
|
dirty |= Self::update_cost_model(&cost_model, &execute_timings);
|
|
|
|
update_count += 1;
|
|
|
|
}
|
|
|
|
}
|
Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
2. CostTracker which is shared between threads, tracks transaction costs for each block.
* replace hard coded program ID with id() calls
* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.
* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.
* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes
* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;
* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
* replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;
* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;
* changed cost unit to microsecond, using value collected from mainnet;
* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
* investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table
* Change mutex on cost_tracker to RwLock
* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.
* acquire and hold locks for block of TXs, instead of acquire and release per transaction;
* remove redundant would_fit check from cost_tracker update execution path
* refactor cost checking with less frequent lock acquiring
* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.
* create hashmap with new_capacity to reduce runtime heap realloc.
* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default
* address potential deadlock by acquiring locks one at time
* Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
* log warning when channel send fails (#18391)
* Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions
* review fixes
* update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* manually fix merge conflicts
* Per-program id timings (#17554)
* more manual fixing
* solve a merge conflict
* featurize cost model
* more merge fix
* cost model uses compute_unit to replace microsecond as cost unit
(#18934)
* Reject blocks for costs above the max block cost (#18994)
* Update block max cost limit to fix performance regession (#19276)
* replace function with const var for better readability (#19285)
* Add few more metrics data points (#19624)
* periodically report sigverify_stage stats (#19674)
* manual merge
* cost model nits (#18528)
* Accumulate consumed units (#18714)
* tx wide compute budget (#18631)
* more manual merge
* ignore zerorize drop security
* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984
* add transaction cost histogram metrics (#20350)
* rebase to 1.7.15
* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes
* remove cost_model feature_set
* ignore vote transactions from cost model
Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
|
|
|
}
|
|
|
|
update_cost_model_time.stop();
|
|
|
|
|
|
|
|
let mut persist_cost_table_time = Measure::start("persist_cost_table_time");
|
|
|
|
if dirty {
|
|
|
|
Self::persist_cost_table(&blockstore, &cost_model);
|
|
|
|
}
|
|
|
|
persist_cost_table_time.stop();
|
|
|
|
|
|
|
|
cost_update_service_timing.update(
|
|
|
|
update_count,
|
|
|
|
update_cost_model_time.as_us(),
|
|
|
|
persist_cost_table_time.as_us(),
|
|
|
|
);
|
|
|
|
|
|
|
|
thread::sleep(wait_timer);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
fn update_cost_model(cost_model: &RwLock<CostModel>, execute_timings: &ExecuteTimings) -> bool {
|
|
|
|
let mut dirty = false;
|
|
|
|
{
|
|
|
|
let mut cost_model_mutable = cost_model.write().unwrap();
|
|
|
|
for (program_id, timing) in &execute_timings.details.per_program_timings {
|
|
|
|
if timing.count < 1 {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
let units = timing.accumulated_units / timing.count as u64;
|
|
|
|
match cost_model_mutable.upsert_instruction_cost(program_id, units) {
|
|
|
|
Ok(c) => {
|
|
|
|
debug!(
|
|
|
|
"after replayed into bank, instruction {:?} has averaged cost {}",
|
|
|
|
program_id, c
|
|
|
|
);
|
|
|
|
dirty = true;
|
|
|
|
}
|
|
|
|
Err(err) => {
|
|
|
|
debug!(
|
|
|
|
"after replayed into bank, instruction {:?} failed to update cost, err: {}",
|
|
|
|
program_id, err
|
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
debug!(
|
|
|
|
"after replayed into bank, updated cost model instruction cost table, current values: {:?}",
|
|
|
|
cost_model.read().unwrap().get_instruction_cost_table()
|
|
|
|
);
|
|
|
|
dirty
|
|
|
|
}
|
|
|
|
|
|
|
|
fn persist_cost_table(blockstore: &Blockstore, cost_model: &RwLock<CostModel>) {
|
|
|
|
let cost_model_read = cost_model.read().unwrap();
|
|
|
|
let cost_table = cost_model_read.get_instruction_cost_table();
|
|
|
|
let db_records = blockstore.read_program_costs().expect("read programs");
|
|
|
|
|
|
|
|
// delete records from blockstore if they are no longer in cost_table
|
|
|
|
db_records.iter().for_each(|(pubkey, _)| {
|
|
|
|
if cost_table.get(pubkey).is_none() {
|
|
|
|
blockstore
|
|
|
|
.delete_program_cost(pubkey)
|
|
|
|
.expect("delete old program");
|
|
|
|
}
|
|
|
|
});
|
|
|
|
|
|
|
|
for (key, cost) in cost_table.iter() {
|
|
|
|
blockstore
|
|
|
|
.write_program_cost(key, cost)
|
|
|
|
.expect("persist program costs to blockstore");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
#[cfg(test)]
|
|
|
|
mod tests {
|
2021-12-13 08:48:39 -07:00
|
|
|
use super::*;
|
|
|
|
use solana_runtime::message_processor::ProgramTiming;
|
|
|
|
use solana_sdk::pubkey::Pubkey;
|
Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
2. CostTracker which is shared between threads, tracks transaction costs for each block.
* replace hard coded program ID with id() calls
* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.
* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.
* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes
* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;
* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
* replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;
* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;
* changed cost unit to microsecond, using value collected from mainnet;
* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
* investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table
* Change mutex on cost_tracker to RwLock
* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.
* acquire and hold locks for block of TXs, instead of acquire and release per transaction;
* remove redundant would_fit check from cost_tracker update execution path
* refactor cost checking with less frequent lock acquiring
* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.
* create hashmap with new_capacity to reduce runtime heap realloc.
* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default
* address potential deadlock by acquiring locks one at time
* Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
* log warning when channel send fails (#18391)
* Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions
* review fixes
* update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* manually fix merge conflicts
* Per-program id timings (#17554)
* more manual fixing
* solve a merge conflict
* featurize cost model
* more merge fix
* cost model uses compute_unit to replace microsecond as cost unit
(#18934)
* Reject blocks for costs above the max block cost (#18994)
* Update block max cost limit to fix performance regession (#19276)
* replace function with const var for better readability (#19285)
* Add few more metrics data points (#19624)
* periodically report sigverify_stage stats (#19674)
* manual merge
* cost model nits (#18528)
* Accumulate consumed units (#18714)
* tx wide compute budget (#18631)
* more manual merge
* ignore zerorize drop security
* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984
* add transaction cost histogram metrics (#20350)
* rebase to 1.7.15
* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes
* remove cost_model feature_set
* ignore vote transactions from cost model
Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn test_update_cost_model_with_empty_execute_timings() {
|
|
|
|
let cost_model = Arc::new(RwLock::new(CostModel::default()));
|
|
|
|
let empty_execute_timings = ExecuteTimings::default();
|
|
|
|
CostUpdateService::update_cost_model(&cost_model, &empty_execute_timings);
|
|
|
|
|
|
|
|
assert_eq!(
|
|
|
|
0,
|
|
|
|
cost_model
|
|
|
|
.read()
|
|
|
|
.unwrap()
|
|
|
|
.get_instruction_cost_table()
|
|
|
|
.len()
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn test_update_cost_model_with_execute_timings() {
|
|
|
|
let cost_model = Arc::new(RwLock::new(CostModel::default()));
|
|
|
|
let mut execute_timings = ExecuteTimings::default();
|
|
|
|
|
|
|
|
let program_key_1 = Pubkey::new_unique();
|
|
|
|
let mut expected_cost: u64;
|
|
|
|
|
|
|
|
// add new program
|
|
|
|
{
|
|
|
|
let accumulated_us: u64 = 1000;
|
|
|
|
let accumulated_units: u64 = 100;
|
|
|
|
let count: u32 = 10;
|
|
|
|
expected_cost = accumulated_units / count as u64;
|
|
|
|
|
|
|
|
execute_timings.details.per_program_timings.insert(
|
|
|
|
program_key_1,
|
|
|
|
ProgramTiming {
|
|
|
|
accumulated_us,
|
|
|
|
accumulated_units,
|
|
|
|
count,
|
|
|
|
},
|
|
|
|
);
|
|
|
|
CostUpdateService::update_cost_model(&cost_model, &execute_timings);
|
|
|
|
assert_eq!(
|
|
|
|
1,
|
|
|
|
cost_model
|
|
|
|
.read()
|
|
|
|
.unwrap()
|
|
|
|
.get_instruction_cost_table()
|
|
|
|
.len()
|
|
|
|
);
|
|
|
|
assert_eq!(
|
|
|
|
Some(&expected_cost),
|
|
|
|
cost_model
|
|
|
|
.read()
|
|
|
|
.unwrap()
|
|
|
|
.get_instruction_cost_table()
|
|
|
|
.get(&program_key_1)
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
// update program
|
|
|
|
{
|
|
|
|
let accumulated_us: u64 = 2000;
|
|
|
|
let accumulated_units: u64 = 200;
|
|
|
|
let count: u32 = 10;
|
|
|
|
// to expect new cost is Average(new_value, existing_value)
|
|
|
|
expected_cost = ((accumulated_units / count as u64) + expected_cost) / 2;
|
|
|
|
|
|
|
|
execute_timings.details.per_program_timings.insert(
|
|
|
|
program_key_1,
|
|
|
|
ProgramTiming {
|
|
|
|
accumulated_us,
|
|
|
|
accumulated_units,
|
|
|
|
count,
|
|
|
|
},
|
|
|
|
);
|
|
|
|
CostUpdateService::update_cost_model(&cost_model, &execute_timings);
|
|
|
|
assert_eq!(
|
|
|
|
1,
|
|
|
|
cost_model
|
|
|
|
.read()
|
|
|
|
.unwrap()
|
|
|
|
.get_instruction_cost_table()
|
|
|
|
.len()
|
|
|
|
);
|
|
|
|
assert_eq!(
|
|
|
|
Some(&expected_cost),
|
|
|
|
cost_model
|
|
|
|
.read()
|
|
|
|
.unwrap()
|
|
|
|
.get_instruction_cost_table()
|
|
|
|
.get(&program_key_1)
|
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|