Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694) * * Add following to banking_stage: 1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions. 2. CostTracker which is shared between threads, tracks transaction costs for each block. * replace hard coded program ID with id() calls * Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed. * Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table. * add test for cost_tracker atomically try_add operation, serves as safety guard for future changes * check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker; * bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations * replay stage feed back program cost (#17731) * replay stage feeds back realtime per-program execution cost to cost model; * program cost execution table is initialized into empty table, no longer populated with hardcoded numbers; * changed cost unit to microsecond, using value collected from mainnet; * add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs. * investigate system performance test degradation (#17919) * Add stats and counter around cost model ops, mainly: - calculate transaction cost - check transaction can fit in a block - update block cost tracker after transactions are added to block - replay_stage to update/insert execution cost to table * Change mutex on cost_tracker to RwLock * removed cloning cost_tracker for local use, as the metrics show clone is very expensive. * acquire and hold locks for block of TXs, instead of acquire and release per transaction; * remove redundant would_fit check from cost_tracker update execution path * refactor cost checking with less frequent lock acquiring * avoid many Transaction_cost heap allocation when calculate cost, which is in the hot path - executed per transaction. * create hashmap with new_capacity to reduce runtime heap realloc. * code review changes: categorize stats, replace explicit drop calls, concisely initiate to default * address potential deadlock by acquiring locks one at time * Persist cost table to blockstore (#18123) * Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks * Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()` * Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time * Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory * Only try to persist to blockstore when cost_table is changed. * Restore cost table during validator startup * Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads; * Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model. * log warning when channel send fails (#18391) * Aggregate cost_model into cost_tracker (#18374) * * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions * review fixes * update ledger tool to restore cost table from blockstore (#18489) * update ledger tool to restore cost model from blockstore when compute-slot-cost * Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool * refactor and simplify a test * manually fix merge conflicts * Per-program id timings (#17554) * more manual fixing * solve a merge conflict * featurize cost model * more merge fix * cost model uses compute_unit to replace microsecond as cost unit (#18934) * Reject blocks for costs above the max block cost (#18994) * Update block max cost limit to fix performance regession (#19276) * replace function with const var for better readability (#19285) * Add few more metrics data points (#19624) * periodically report sigverify_stage stats (#19674) * manual merge * cost model nits (#18528) * Accumulate consumed units (#18714) * tx wide compute budget (#18631) * more manual merge * ignore zerorize drop security * - update const cost values with data collected by #19627 - update cost calculation to closely proposed fee schedule #16984 * add transaction cost histogram metrics (#20350) * rebase to 1.7.15 * add tx count and thread id to stats (#20451) each stat reports and resets when slot changes * remove cost_model feature_set * ignore vote transactions from cost model Co-authored-by: sakridge <sakridge@gmail.com> Co-authored-by: Jeff Biseda <jbiseda@gmail.com> Co-authored-by: Jack May <jack@solana.com>
This commit is contained in:
279
core/src/execute_cost_table.rs
Normal file
279
core/src/execute_cost_table.rs
Normal file
@@ -0,0 +1,279 @@
|
||||
/// ExecuteCostTable is aggregated by Cost Model, it keeps each program's
|
||||
/// average cost in its HashMap, with fixed capacity to avoid from growing
|
||||
/// unchecked.
|
||||
/// When its capacity limit is reached, it prunes old and less-used programs
|
||||
/// to make room for new ones.
|
||||
use log::*;
|
||||
use solana_sdk::pubkey::Pubkey;
|
||||
use std::{collections::HashMap, time::SystemTime};
|
||||
|
||||
// prune is rather expensive op, free up bulk space in each operation
|
||||
// would be more efficient. PRUNE_RATIO defines the after prune table
|
||||
// size will be original_size * PRUNE_RATIO.
|
||||
const PRUNE_RATIO: f64 = 0.75;
|
||||
// with 50_000 TPS as norm, weights occurrences '100' per microsec
|
||||
const OCCURRENCES_WEIGHT: i64 = 100;
|
||||
|
||||
const DEFAULT_CAPACITY: usize = 1024;
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct ExecuteCostTable {
|
||||
capacity: usize,
|
||||
table: HashMap<Pubkey, u64>,
|
||||
occurrences: HashMap<Pubkey, (usize, SystemTime)>,
|
||||
}
|
||||
|
||||
impl Default for ExecuteCostTable {
|
||||
fn default() -> Self {
|
||||
ExecuteCostTable::new(DEFAULT_CAPACITY)
|
||||
}
|
||||
}
|
||||
|
||||
impl ExecuteCostTable {
|
||||
pub fn new(cap: usize) -> Self {
|
||||
Self {
|
||||
capacity: cap,
|
||||
table: HashMap::with_capacity(cap),
|
||||
occurrences: HashMap::with_capacity(cap),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_cost_table(&self) -> &HashMap<Pubkey, u64> {
|
||||
&self.table
|
||||
}
|
||||
|
||||
pub fn get_count(&self) -> usize {
|
||||
self.table.len()
|
||||
}
|
||||
|
||||
// instead of assigning unknown program with a configured/hard-coded cost
|
||||
// use average or mode function to make a educated guess.
|
||||
pub fn get_average(&self) -> u64 {
|
||||
if self.table.is_empty() {
|
||||
0
|
||||
} else {
|
||||
self.table.iter().map(|(_, value)| value).sum::<u64>() / self.get_count() as u64
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_mode(&self) -> u64 {
|
||||
if self.occurrences.is_empty() {
|
||||
0
|
||||
} else {
|
||||
let key = self
|
||||
.occurrences
|
||||
.iter()
|
||||
.max_by_key(|&(_, count)| count)
|
||||
.map(|(key, _)| key)
|
||||
.expect("cannot find mode from cost table");
|
||||
|
||||
*self.table.get(&key).unwrap()
|
||||
}
|
||||
}
|
||||
|
||||
// returns None if program doesn't exist in table. In this case,
|
||||
// client is advised to call `get_average()` or `get_mode()` to
|
||||
// assign a 'default' value for new program.
|
||||
pub fn get_cost(&self, key: &Pubkey) -> Option<&u64> {
|
||||
self.table.get(&key)
|
||||
}
|
||||
|
||||
pub fn upsert(&mut self, key: &Pubkey, value: u64) -> Option<u64> {
|
||||
let need_to_add = self.table.get(key).is_none();
|
||||
let current_size = self.get_count();
|
||||
if current_size == self.capacity && need_to_add {
|
||||
self.prune_to(&((current_size as f64 * PRUNE_RATIO) as usize));
|
||||
}
|
||||
|
||||
let program_cost = self.table.entry(*key).or_insert(value);
|
||||
*program_cost = (*program_cost + value) / 2;
|
||||
|
||||
let (count, timestamp) = self
|
||||
.occurrences
|
||||
.entry(*key)
|
||||
.or_insert((0, SystemTime::now()));
|
||||
*count += 1;
|
||||
*timestamp = SystemTime::now();
|
||||
|
||||
Some(*program_cost)
|
||||
}
|
||||
|
||||
// prune the old programs so the table contains `new_size` of records,
|
||||
// where `old` is defined as weighted age, which is negatively correlated
|
||||
// with program's age and
|
||||
// positively correlated with how frequently the program
|
||||
// is executed (eg. occurrence),
|
||||
fn prune_to(&mut self, new_size: &usize) {
|
||||
debug!(
|
||||
"prune cost table, current size {}, new size {}",
|
||||
self.get_count(),
|
||||
new_size
|
||||
);
|
||||
|
||||
if *new_size == self.get_count() {
|
||||
return;
|
||||
}
|
||||
|
||||
if *new_size == 0 {
|
||||
self.table.clear();
|
||||
self.occurrences.clear();
|
||||
return;
|
||||
}
|
||||
|
||||
let now = SystemTime::now();
|
||||
let mut sorted_by_weighted_age: Vec<_> = self
|
||||
.occurrences
|
||||
.iter()
|
||||
.map(|(key, (count, timestamp))| {
|
||||
let age = now.duration_since(*timestamp).unwrap().as_micros();
|
||||
let weighted_age = *count as i64 * OCCURRENCES_WEIGHT + -(age as i64);
|
||||
(weighted_age, *key)
|
||||
})
|
||||
.collect();
|
||||
sorted_by_weighted_age.sort_by(|x, y| x.0.partial_cmp(&y.0).unwrap());
|
||||
|
||||
for i in sorted_by_weighted_age.iter() {
|
||||
self.table.remove(&i.1);
|
||||
self.occurrences.remove(&i.1);
|
||||
if *new_size == self.get_count() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_execute_cost_table_prune_simple_table() {
|
||||
solana_logger::setup();
|
||||
let capacity: usize = 3;
|
||||
let mut testee = ExecuteCostTable::new(capacity);
|
||||
|
||||
let key1 = Pubkey::new_unique();
|
||||
let key2 = Pubkey::new_unique();
|
||||
let key3 = Pubkey::new_unique();
|
||||
|
||||
testee.upsert(&key1, 1);
|
||||
testee.upsert(&key2, 2);
|
||||
testee.upsert(&key3, 3);
|
||||
|
||||
testee.prune_to(&(capacity - 1));
|
||||
|
||||
// the oldest, key1, should be pruned
|
||||
assert!(testee.get_cost(&key1).is_none());
|
||||
assert!(testee.get_cost(&key2).is_some());
|
||||
assert!(testee.get_cost(&key2).is_some());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_execute_cost_table_prune_weighted_table() {
|
||||
solana_logger::setup();
|
||||
let capacity: usize = 3;
|
||||
let mut testee = ExecuteCostTable::new(capacity);
|
||||
|
||||
let key1 = Pubkey::new_unique();
|
||||
let key2 = Pubkey::new_unique();
|
||||
let key3 = Pubkey::new_unique();
|
||||
|
||||
testee.upsert(&key1, 1);
|
||||
testee.upsert(&key1, 1);
|
||||
testee.upsert(&key2, 2);
|
||||
testee.upsert(&key3, 3);
|
||||
|
||||
testee.prune_to(&(capacity - 1));
|
||||
|
||||
// the oldest, key1, has 2 counts; 2nd oldest Key2 has 1 count;
|
||||
// expect key2 to be pruned.
|
||||
assert!(testee.get_cost(&key1).is_some());
|
||||
assert!(testee.get_cost(&key2).is_none());
|
||||
assert!(testee.get_cost(&key3).is_some());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_execute_cost_table_upsert_within_capacity() {
|
||||
solana_logger::setup();
|
||||
let mut testee = ExecuteCostTable::default();
|
||||
|
||||
let key1 = Pubkey::new_unique();
|
||||
let key2 = Pubkey::new_unique();
|
||||
let cost1: u64 = 100;
|
||||
let cost2: u64 = 110;
|
||||
|
||||
// query empty table
|
||||
assert!(testee.get_cost(&key1).is_none());
|
||||
|
||||
// insert one record
|
||||
testee.upsert(&key1, cost1);
|
||||
assert_eq!(1, testee.get_count());
|
||||
assert_eq!(cost1, testee.get_average());
|
||||
assert_eq!(cost1, testee.get_mode());
|
||||
assert_eq!(&cost1, testee.get_cost(&key1).unwrap());
|
||||
|
||||
// insert 2nd record
|
||||
testee.upsert(&key2, cost2);
|
||||
assert_eq!(2, testee.get_count());
|
||||
assert_eq!((cost1 + cost2) / 2_u64, testee.get_average());
|
||||
assert_eq!(cost2, testee.get_mode());
|
||||
assert_eq!(&cost1, testee.get_cost(&key1).unwrap());
|
||||
assert_eq!(&cost2, testee.get_cost(&key2).unwrap());
|
||||
|
||||
// update 1st record
|
||||
testee.upsert(&key1, cost2);
|
||||
assert_eq!(2, testee.get_count());
|
||||
assert_eq!(((cost1 + cost2) / 2 + cost2) / 2, testee.get_average());
|
||||
assert_eq!((cost1 + cost2) / 2, testee.get_mode());
|
||||
assert_eq!(&((cost1 + cost2) / 2), testee.get_cost(&key1).unwrap());
|
||||
assert_eq!(&cost2, testee.get_cost(&key2).unwrap());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_execute_cost_table_upsert_exceeds_capacity() {
|
||||
solana_logger::setup();
|
||||
let capacity: usize = 2;
|
||||
let mut testee = ExecuteCostTable::new(capacity);
|
||||
|
||||
let key1 = Pubkey::new_unique();
|
||||
let key2 = Pubkey::new_unique();
|
||||
let key3 = Pubkey::new_unique();
|
||||
let key4 = Pubkey::new_unique();
|
||||
let cost1: u64 = 100;
|
||||
let cost2: u64 = 110;
|
||||
let cost3: u64 = 120;
|
||||
let cost4: u64 = 130;
|
||||
|
||||
// insert one record
|
||||
testee.upsert(&key1, cost1);
|
||||
assert_eq!(1, testee.get_count());
|
||||
assert_eq!(&cost1, testee.get_cost(&key1).unwrap());
|
||||
|
||||
// insert 2nd record
|
||||
testee.upsert(&key2, cost2);
|
||||
assert_eq!(2, testee.get_count());
|
||||
assert_eq!(&cost1, testee.get_cost(&key1).unwrap());
|
||||
assert_eq!(&cost2, testee.get_cost(&key2).unwrap());
|
||||
|
||||
// insert 3rd record, pushes out the oldest (eg 1st) record
|
||||
testee.upsert(&key3, cost3);
|
||||
assert_eq!(2, testee.get_count());
|
||||
assert_eq!((cost2 + cost3) / 2_u64, testee.get_average());
|
||||
assert_eq!(cost3, testee.get_mode());
|
||||
assert!(testee.get_cost(&key1).is_none());
|
||||
assert_eq!(&cost2, testee.get_cost(&key2).unwrap());
|
||||
assert_eq!(&cost3, testee.get_cost(&key3).unwrap());
|
||||
|
||||
// update 2nd record, so the 3rd becomes the oldest
|
||||
// add 4th record, pushes out 3rd key
|
||||
testee.upsert(&key2, cost1);
|
||||
testee.upsert(&key4, cost4);
|
||||
assert_eq!(((cost1 + cost2) / 2 + cost4) / 2_u64, testee.get_average());
|
||||
assert_eq!((cost1 + cost2) / 2, testee.get_mode());
|
||||
assert_eq!(2, testee.get_count());
|
||||
assert!(testee.get_cost(&key1).is_none());
|
||||
assert_eq!(&((cost1 + cost2) / 2), testee.get_cost(&key2).unwrap());
|
||||
assert!(testee.get_cost(&key3).is_none());
|
||||
assert_eq!(&cost4, testee.get_cost(&key4).unwrap());
|
||||
}
|
||||
}
|
Reference in New Issue
Block a user