Files
solana/core/src/tpu.rs
Tao Zhu db85d659b9 Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:55:29 -06:00

175 lines
5.8 KiB
Rust

//! The `tpu` module implements the Transaction Processing Unit, a
//! multi-stage transaction processing pipeline in software.
use crate::{
banking_stage::BankingStage,
broadcast_stage::{BroadcastStage, BroadcastStageType, RetransmitSlotsReceiver},
cluster_info_vote_listener::{
ClusterInfoVoteListener, GossipDuplicateConfirmedSlotsSender, GossipVerifiedVoteHashSender,
VerifiedVoteSender, VoteTracker,
},
cost_model::CostModel,
cost_tracker::CostTracker,
fetch_stage::FetchStage,
sigverify::TransactionSigVerifier,
sigverify_stage::SigVerifyStage,
};
use crossbeam_channel::unbounded;
use solana_gossip::cluster_info::ClusterInfo;
use solana_ledger::{blockstore::Blockstore, blockstore_processor::TransactionStatusSender};
use solana_poh::poh_recorder::{PohRecorder, WorkingBankEntry};
use solana_rpc::{
optimistically_confirmed_bank_tracker::BankNotificationSender,
rpc_subscriptions::RpcSubscriptions,
};
use solana_runtime::{
bank_forks::BankForks,
vote_sender_types::{ReplayVoteReceiver, ReplayVoteSender},
};
use std::{
net::UdpSocket,
sync::{
atomic::AtomicBool,
mpsc::{channel, Receiver},
Arc, Mutex, RwLock,
},
thread,
};
pub const DEFAULT_TPU_COALESCE_MS: u64 = 5;
pub struct Tpu {
fetch_stage: FetchStage,
sigverify_stage: SigVerifyStage,
vote_sigverify_stage: SigVerifyStage,
banking_stage: BankingStage,
cluster_info_vote_listener: ClusterInfoVoteListener,
broadcast_stage: BroadcastStage,
}
impl Tpu {
#[allow(clippy::too_many_arguments)]
pub fn new(
cluster_info: &Arc<ClusterInfo>,
poh_recorder: &Arc<Mutex<PohRecorder>>,
entry_receiver: Receiver<WorkingBankEntry>,
retransmit_slots_receiver: RetransmitSlotsReceiver,
transactions_sockets: Vec<UdpSocket>,
tpu_forwards_sockets: Vec<UdpSocket>,
tpu_vote_sockets: Vec<UdpSocket>,
broadcast_sockets: Vec<UdpSocket>,
subscriptions: &Arc<RpcSubscriptions>,
transaction_status_sender: Option<TransactionStatusSender>,
blockstore: &Arc<Blockstore>,
broadcast_type: &BroadcastStageType,
exit: &Arc<AtomicBool>,
shred_version: u16,
vote_tracker: Arc<VoteTracker>,
bank_forks: Arc<RwLock<BankForks>>,
verified_vote_sender: VerifiedVoteSender,
gossip_verified_vote_hash_sender: GossipVerifiedVoteHashSender,
replay_vote_receiver: ReplayVoteReceiver,
replay_vote_sender: ReplayVoteSender,
bank_notification_sender: Option<BankNotificationSender>,
tpu_coalesce_ms: u64,
cluster_confirmed_slot_sender: GossipDuplicateConfirmedSlotsSender,
cost_model: &Arc<RwLock<CostModel>>,
) -> Self {
let (packet_sender, packet_receiver) = channel();
let (vote_packet_sender, vote_packet_receiver) = channel();
let fetch_stage = FetchStage::new_with_sender(
transactions_sockets,
tpu_forwards_sockets,
tpu_vote_sockets,
exit,
&packet_sender,
&vote_packet_sender,
poh_recorder,
tpu_coalesce_ms,
);
let (verified_sender, verified_receiver) = unbounded();
let sigverify_stage = {
let verifier = TransactionSigVerifier::default();
SigVerifyStage::new(packet_receiver, verified_sender, verifier)
};
let (verified_tpu_vote_packets_sender, verified_tpu_vote_packets_receiver) = unbounded();
let vote_sigverify_stage = {
let verifier = TransactionSigVerifier::new_reject_non_vote();
SigVerifyStage::new(
vote_packet_receiver,
verified_tpu_vote_packets_sender,
verifier,
)
};
let (verified_gossip_vote_packets_sender, verified_gossip_vote_packets_receiver) =
unbounded();
let cluster_info_vote_listener = ClusterInfoVoteListener::new(
exit,
cluster_info.clone(),
verified_gossip_vote_packets_sender,
poh_recorder,
vote_tracker,
bank_forks.clone(),
subscriptions.clone(),
verified_vote_sender,
gossip_verified_vote_hash_sender,
replay_vote_receiver,
blockstore.clone(),
bank_notification_sender,
cluster_confirmed_slot_sender,
);
let cost_tracker = Arc::new(RwLock::new(CostTracker::new(cost_model.clone())));
let banking_stage = BankingStage::new(
cluster_info,
poh_recorder,
verified_receiver,
verified_tpu_vote_packets_receiver,
verified_gossip_vote_packets_receiver,
transaction_status_sender,
replay_vote_sender,
cost_tracker,
);
let broadcast_stage = broadcast_type.new_broadcast_stage(
broadcast_sockets,
cluster_info.clone(),
entry_receiver,
retransmit_slots_receiver,
exit,
blockstore,
&bank_forks,
shred_version,
);
Self {
fetch_stage,
sigverify_stage,
vote_sigverify_stage,
banking_stage,
cluster_info_vote_listener,
broadcast_stage,
}
}
pub fn join(self) -> thread::Result<()> {
let results = vec![
self.fetch_stage.join(),
self.sigverify_stage.join(),
self.vote_sigverify_stage.join(),
self.cluster_info_vote_listener.join(),
self.banking_stage.join(),
];
let broadcast_result = self.broadcast_stage.join();
for result in results {
result?;
}
let _ = broadcast_result?;
Ok(())
}
}