Cost model 1.7 (#20188)

* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
This commit is contained in:
Tao Zhu
2021-10-06 15:11:41 -05:00
committed by Trent Nelson
parent a4df784e82
commit db85d659b9
40 changed files with 3208 additions and 266 deletions

View File

@@ -38,9 +38,11 @@ full = [
]
[dependencies]
assert_matches = { version = "1.3.0", optional = true }
bincode = "1.3.1"
bs58 = "0.3.1"
assert_matches = { version = "1.5.0", optional = true }
bincode = "1.3.3"
borsh = "0.9.0"
borsh-derive = "0.9.0"
bs58 = "0.4.0"
bv = { version = "0.11.1", features = ["serde"] }
byteorder = { version = "1.3.4", optional = true }
chrono = { version = "0.4", optional = true }

142
sdk/src/compute_budget.rs Normal file
View File

@@ -0,0 +1,142 @@
#![cfg(feature = "full")]
use crate::{
process_instruction::BpfComputeBudget,
transaction::{Transaction, TransactionError},
};
use borsh::{BorshDeserialize, BorshSchema, BorshSerialize};
use solana_sdk::{
borsh::try_from_slice_unchecked,
instruction::{Instruction, InstructionError},
};
crate::declare_id!("ComputeBudget111111111111111111111111111111");
const MAX_UNITS: u64 = 1_000_000;
/// Compute Budget Instructions
#[derive(
Serialize,
Deserialize,
BorshSerialize,
BorshDeserialize,
BorshSchema,
Debug,
Clone,
PartialEq,
AbiExample,
AbiEnumVisitor,
)]
pub enum ComputeBudgetInstruction {
/// Request a specific maximum number of compute units the transaction is
/// allowed to consume.
RequestUnits(u64),
}
/// Create a `ComputeBudgetInstruction::RequestUnits` `Instruction`
pub fn request_units(units: u64) -> Instruction {
Instruction::new_with_borsh(id(), &ComputeBudgetInstruction::RequestUnits(units), vec![])
}
pub fn process_request(
compute_budget: &mut BpfComputeBudget,
tx: &Transaction,
) -> Result<(), TransactionError> {
let error = TransactionError::InstructionError(0, InstructionError::InvalidInstructionData);
// Compute budget instruction must be in 1st or 2nd instruction (avoid nonce marker)
for instruction in tx.message().instructions.iter().take(2) {
if check_id(instruction.program_id(&tx.message().account_keys)) {
let ComputeBudgetInstruction::RequestUnits(units) =
try_from_slice_unchecked::<ComputeBudgetInstruction>(&instruction.data)
.map_err(|_| error.clone())?;
if units > MAX_UNITS {
return Err(error);
}
compute_budget.max_units = units;
}
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::{
compute_budget, hash::Hash, message::Message, pubkey::Pubkey, signature::Keypair,
signer::Signer,
};
#[test]
fn test_process_request() {
let payer_keypair = Keypair::new();
let mut compute_budget = BpfComputeBudget::default();
let tx = Transaction::new(
&[&payer_keypair],
Message::new(&[], Some(&payer_keypair.pubkey())),
Hash::default(),
);
process_request(&mut compute_budget, &tx).unwrap();
assert_eq!(compute_budget, BpfComputeBudget::default());
let tx = Transaction::new(
&[&payer_keypair],
Message::new(
&[
compute_budget::request_units(1),
Instruction::new_with_bincode(Pubkey::new_unique(), &0, vec![]),
],
Some(&payer_keypair.pubkey()),
),
Hash::default(),
);
process_request(&mut compute_budget, &tx).unwrap();
assert_eq!(
compute_budget,
BpfComputeBudget {
max_units: 1,
..BpfComputeBudget::default()
}
);
let tx = Transaction::new(
&[&payer_keypair],
Message::new(
&[
compute_budget::request_units(MAX_UNITS + 1),
Instruction::new_with_bincode(Pubkey::new_unique(), &0, vec![]),
],
Some(&payer_keypair.pubkey()),
),
Hash::default(),
);
let result = process_request(&mut compute_budget, &tx);
assert_eq!(
result,
Err(TransactionError::InstructionError(
0,
InstructionError::InvalidInstructionData
))
);
let tx = Transaction::new(
&[&payer_keypair],
Message::new(
&[
Instruction::new_with_bincode(Pubkey::new_unique(), &0, vec![]),
compute_budget::request_units(MAX_UNITS),
],
Some(&payer_keypair.pubkey()),
),
Hash::default(),
);
process_request(&mut compute_budget, &tx).unwrap();
assert_eq!(
compute_budget,
BpfComputeBudget {
max_units: MAX_UNITS,
..BpfComputeBudget::default()
}
);
}
}

View File

@@ -175,10 +175,6 @@ pub mod stake_merge_with_unmatched_credits_observed {
solana_sdk::declare_id!("meRgp4ArRPhD3KtCY9c5yAf2med7mBLsjKTPeVUHqBL");
}
pub mod gate_large_block {
solana_sdk::declare_id!("2ry7ygxiYURULZCrypHhveanvP5tzZ4toRwVp89oCNSj");
}
pub mod mem_overlap_fix {
solana_sdk::declare_id!("vXDCFK7gphrEmyf5VnKgLmqbdJ4UxD2eZH1qbdouYKF");
}
@@ -223,6 +219,14 @@ pub mod optimize_epoch_boundary_updates {
solana_sdk::declare_id!("265hPS8k8xJ37ot82KEgjRunsUp5w4n4Q4VwwiN9i9ps");
}
pub mod tx_wide_compute_cap {
solana_sdk::declare_id!("5ekBxc8itEnPv4NzGJtr8BVVQLNMQuLMNQQj7pHoLNZ9");
}
pub mod gate_large_block {
solana_sdk::declare_id!("2ry7ygxiYURULZCrypHhveanvP5tzZ4toRwVp89oCNSj");
}
lazy_static! {
/// Map of feature identifiers to user-visible description
pub static ref FEATURE_NAMES: HashMap<Pubkey, &'static str> = [
@@ -267,7 +271,6 @@ lazy_static! {
(merge_nonce_error_into_system_error::id(), "merge NonceError into SystemError"),
(spl_token_v2_set_authority_fix::id(), "spl-token set_authority fix"),
(stake_merge_with_unmatched_credits_observed::id(), "allow merging active stakes with unmatched credits_observed #18985"),
(gate_large_block::id(), "validator checks block cost against max limit in realtime, reject if exceeds."),
(mem_overlap_fix::id(), "Memory overlap fix"),
(close_upgradeable_program_accounts::id(), "enable closing upgradeable program accounts"),
(stake_program_advance_activating_credits_observed::id(), "Enable advancing credits observed for activation epoch #19309"),
@@ -279,6 +282,8 @@ lazy_static! {
(stakes_remove_delegation_if_inactive::id(), "remove delegations from stakes cache when inactive"),
(send_to_tpu_vote_port::id(), "Send votes to the tpu vote port"),
(optimize_epoch_boundary_updates::id(), "Optimize epoch boundary updates"),
(tx_wide_compute_cap::id(), "Transaction wide compute cap"),
(gate_large_block::id(), "validator checks block cost against max limit in realtime, reject if exceeds."),
/*************** ADD NEW FEATURES HERE ***************/
]
.iter()

View File

@@ -15,6 +15,7 @@ pub mod arithmetic;
pub mod builtins;
pub mod client;
pub mod commitment_config;
pub mod compute_budget;
pub mod derivation_path;
pub mod deserialize_utils;
pub mod entrypoint;

View File

@@ -147,7 +147,7 @@ pub fn get_sysvar<T: Sysvar>(
})
}
#[derive(Clone, Copy, Debug, AbiExample)]
#[derive(Clone, Copy, Debug, AbiExample, PartialEq)]
pub struct BpfComputeBudget {
/// Number of compute units that an instruction is allowed. Compute units
/// are consumed by program execution, resources they use, etc...