Cost model 1.7 (#20188)

* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
This commit is contained in:
Tao Zhu
2021-10-06 15:11:41 -05:00
committed by Trent Nelson
parent a4df784e82
commit db85d659b9
40 changed files with 3208 additions and 266 deletions

View File

@@ -26,6 +26,7 @@ serde_json = "1.0.56"
serde_yaml = "0.8.13"
solana-clap-utils = { path = "../clap-utils", version = "=1.8.0" }
solana-cli-output = { path = "../cli-output", version = "=1.8.0" }
solana-core = { path = "../core", version = "=1.8.0" }
solana-ledger = { path = "../ledger", version = "=1.8.0" }
solana-logger = { path = "../logger", version = "=1.8.0" }
solana-measure = { path = "../measure", version = "=1.8.0" }

View File

@@ -15,6 +15,9 @@ use solana_clap_utils::{
is_parsable, is_pubkey, is_pubkey_or_keypair, is_slot, is_valid_percentage,
},
};
use solana_core::cost_model::CostModel;
use solana_core::cost_tracker::CostTracker;
use solana_core::cost_tracker_stats::CostTrackerStats;
use solana_ledger::entry::Entry;
use solana_ledger::{
ancestor_iterator::AncestorIterator,
@@ -727,6 +730,62 @@ fn load_bank_forks(
)
}
fn compute_slot_cost(blockstore: &Blockstore, slot: Slot) -> Result<(), String> {
if blockstore.is_dead(slot) {
return Err("Dead slot".to_string());
}
let (entries, _num_shreds, _is_full) = blockstore
.get_slot_entries_with_shred_info(slot, 0, false)
.map_err(|err| format!(" Slot: {}, Failed to load entries, err {:?}", slot, err))?;
let mut transactions = 0;
let mut programs = 0;
let mut program_ids = HashMap::new();
let mut cost_model = CostModel::default();
cost_model.initialize_cost_table(&blockstore.read_program_costs().unwrap());
let cost_model = Arc::new(RwLock::new(cost_model));
let mut cost_tracker = CostTracker::new(cost_model.clone());
let mut cost_tracker_stats = CostTrackerStats::default();
for entry in &entries {
transactions += entry.transactions.len();
let mut cost_model = cost_model.write().unwrap();
for transaction in &entry.transactions {
programs += transaction.message().instructions.len();
let tx_cost = cost_model.calculate_cost(transaction, true);
if cost_tracker
.try_add(tx_cost, &mut cost_tracker_stats)
.is_err()
{
println!(
"Slot: {}, CostModel rejected transaction {:?}, stats {:?}!",
slot,
transaction,
cost_tracker.get_stats()
);
}
for instruction in &transaction.message().instructions {
let program_id =
transaction.message().account_keys[instruction.program_id_index as usize];
*program_ids.entry(program_id).or_insert(0) += 1;
}
}
}
println!(
"Slot: {}, Entries: {}, Transactions: {}, Programs {}, {:?}",
slot,
entries.len(),
transactions,
programs,
cost_tracker.get_stats()
);
println!(" Programs: {:?}", program_ids);
Ok(())
}
fn open_genesis_config_by(ledger_path: &Path, matches: &ArgMatches<'_>) -> GenesisConfig {
let max_genesis_archive_unpacked_size =
value_t_or_exit!(matches, "max_genesis_archive_unpacked_size", u64);
@@ -1414,6 +1473,20 @@ fn main() {
.about("Output statistics in JSON format about \
all column families in the ledger rocksdb")
)
.subcommand(
SubCommand::with_name("compute-slot-cost")
.about("runs cost_model over the block at the given slots, \
computes how expensive a block was based on cost_model")
.arg(
Arg::with_name("slots")
.index(1)
.value_name("SLOTS")
.validator(is_slot)
.multiple(true)
.takes_value(true)
.help("Slots that their blocks are computed for cost, default to all slots in ledger"),
)
)
.get_matches();
info!("{} {}", crate_name!(), solana_version::version!());
@@ -2964,6 +3037,28 @@ fn main() {
));
println!("Ok.");
}
("compute-slot-cost", Some(arg_matches)) => {
let blockstore = open_blockstore(
&ledger_path,
AccessType::TryPrimaryThenSecondary,
wal_recovery_mode,
);
let mut slots: Vec<u64> = vec![];
if !arg_matches.is_present("slots") {
if let Ok(metas) = blockstore.slot_meta_iterator(0) {
slots = metas.map(|(slot, _)| slot).collect();
}
} else {
slots = values_t_or_exit!(arg_matches, "slots", Slot);
}
for slot in slots {
if let Err(err) = compute_slot_cost(&blockstore, slot) {
eprintln!("{}", err);
}
}
}
("", _) => {
eprintln!("{}", matches.usage());
exit(1);