Commit Graph

141 Commits

Author SHA1 Message Date
Tao Zhu
db85d659b9 Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:55:29 -06:00
mergify[bot]
2a93147b1b Add voting service (#18552) (#18722)
Co-authored-by: sakridge <sakridge@gmail.com>
2021-07-16 22:12:04 +02:00
mergify[bot]
c534c928a7 encapsulates turbine peers computations of broadcast & retransmit stages (#18238) (#18464)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.

(cherry picked from commit 04787be8b1)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-07 14:30:55 +00:00
mergify[bot]
e247625025 Create solana-poh and move remaining rpc modules to solana-rpc (backport #17698) (#17745)
* Create solana-poh and move remaining rpc modules to solana-rpc (#17698)

* Create solana-poh crate

* Move BigTableUploadService to solana-ledger

* Add solana-rpc to workspace

* Move dependencies to solana-rpc

* Move remaining rpc modules to solana-rpc

* Single use statement solana-poh

* Single use statement solana-rpc

(cherry picked from commit 544b3c0d17)

# Conflicts:
#	Cargo.lock
#	banking-bench/Cargo.toml
#	core/Cargo.toml
#	core/benches/banking_stage.rs
#	local-cluster/Cargo.toml
#	rpc/Cargo.toml
#	stake-monitor/Cargo.toml
#	validator/Cargo.toml

* Fix conflicts & versions

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-06-04 18:19:08 +00:00
Tyera Eulberg
ab581dafc2 Add block height to ConfirmedBlock structs (#17523)
* Add BlockHeight CF to blockstore

* Rename CacheBlockTimeService to be more general

* Cache block-height using service

* Fixup previous proto mishandling

* Add block_height to block structs

* Add block-height to solana block

* Fallback to BankForks if block time or block height are not yet written to Blockstore

* Add docs

* Review comments
2021-05-26 22:16:16 -06:00
Tyera Eulberg
9a5330b7eb Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00
Tao Zhu
0781fe1b4f Upgrade Rust to 1.52.0 (#17096)
* Upgrade Rust to 1.52.0
update nightly_version to newly pushed docker image
fix clippy lint errors
1.52 comes with grcov 0.8.0, include this version to script

* upgrade to Rust 1.52.1

* disabling Serum from downstream projects until it is upgraded to Rust 1.52.1
2021-05-19 09:31:47 -05:00
Tyera Eulberg
827355a6b1 Create solana-rpc crate and move subscriptions (#17320)
* Move non_circulating_supply to runtime

* Add solana-rpc crate and move max_slots

* Move subscriptions to solana-rpc

* Single use statements
2021-05-19 00:54:28 -06:00
behzad nouri
b17d5eeaee moves cluster-info metrics to a separate module (#16883) 2021-04-28 02:04:49 +00:00
carllin
4c94f8933f Ingest votes from gossip into fork choice (#16560) 2021-04-21 14:40:35 -07:00
sakridge
8e69dd42c1 Add non-default repair nonce values (#16512)
* Track outstanding nonces in repair

* Rework outstanding requests to use lru cache and randomize nonces

Co-authored-by: Carl <carl@solana.com>
2021-04-20 09:37:33 -07:00
carllin
52703badfa Setup ReplayStage confirmation scaffolding for duplicate slots (#9698) 2021-03-24 23:41:52 -07:00
Justin Starry
918d04e3f0 Add more slot update notifications (#15734)
* Add more slot update notifications

* fix merge

* Address feedback and add integration test

* switch to datapoint

* remove unused shred method

* fix clippy

* new thread for rpc completed slots

* remove extra constant

* fixes

* rely on channel closing

* fix check
2021-03-12 21:44:06 +08:00
sakridge
1b59b163dd Add max retransmit and shred insert slot (#15475) 2021-02-23 13:06:33 -08:00
Trent Nelson
7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00
behzad nouri
b6f231b60e removes locked pubkey references (#15152) 2021-02-08 02:07:00 +00:00
behzad nouri
6a3797e164 adds crds-value for broadcasting duplicate shreds through gossip (#14133)
In gossip, the header overhead we get from:
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/cluster_info.rs#L434-L435
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L31-L36
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L73
already exceeds SIZE_OF_NONCE in shreds. We also need aditional
meta-data (wallclock, source pubkey, ...). Which means that given the
SHRED_PAYLOAD_SIZE, we cannot fit all these in PACKET_DATA_SIZE:
https://github.com/solana-labs/solana/blob/de9ac43eb/ledger/src/shred.rs#L80

On top of that, we need 2 shred payloads as the proof of duplicate. So
each DuplicateShred crds value includes only a chunk of the payload,
along with the meta-data to reconstruct the full payload from the chunks
on the receiving end.
2020-12-18 14:32:43 +00:00
sakridge
d4a174fb7c Partial shred deserialize cleanup and shred type differentiation (#14094)
* Partial shred deserialize cleanup and shred type differentiation in retransmit

* consolidate packet hashing logic
2020-12-15 16:50:40 -08:00
Michael Vines
cdd3e7d856 Remove solana-vote-signer (#14099) 2020-12-13 19:12:20 -08:00
carllin
239a191612 Remove unneeded BankWeight fork choice (#13978)
Co-authored-by: Carl Lin <carl@solana.com>
2020-12-07 13:47:14 -08:00
behzad nouri
ae91270961 implements ping-pong packets between nodes (#12794)
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
2020-10-28 17:03:02 +00:00
Michael Vines
6858950f76 Remove frozen ABI modules from solana-sdk 2020-10-20 16:11:30 -07:00
behzad nouri
05cf15a382 implements DataBudget using atomics (#12856) 2020-10-15 11:33:58 +00:00
Michael Vines
247228ee61 Implementation-defined RPC server errors are now accessible to client/ users 2020-10-13 10:05:44 -07:00
Ryo Onodera
1f4bcf70b0 Fix various ledger-tool error due to no builtins (#12759)
* Fix various ledger-tool error due to no builtins

* Add missing file...
2020-10-09 12:19:36 -06:00
Tyera Eulberg
89621adca7 Rpc -> proper optimistic confirmation (#12514)
* Add service to track the most recent optimistically confirmed bank

* Plumb service into ClusterInfoVoteListener and ReplayStage

* Clean up test

* Use OptimisticallyConfirmedBank in RPC

* Remove superfluous notifications from RpcSubscriptions

* Use crossbeam to avoid mpsc recv_timeout panic

* Review comments

* Remove superfluous last_checked_slots, but pass in OptimisticallyConfirmedBank for complete correctness
2020-09-28 20:43:05 -06:00
carllin
06f84c65f1 Fix rooted accounts cleanup, simplify locking (#12194)
Co-authored-by: Carl Lin <carl@solana.com>
2020-09-28 16:04:46 -07:00
Michael Vines
31696a1d72 Port BPFLoader2 activation to FeatureSet and rework built-in program activation 2020-09-28 12:50:19 -07:00
Josh
65a6bfad09 Add blockstore column to store performance sampling data (#12251)
* Add blockstore column to store performance sampling data

* introduce timer and write performance metrics to blockstore

* introduce getRecentPerformanceSamples rpc

* only run on rpc nodes enabled with transaction history

* add unit tests for get_recent_performance_samples

* remove RpcResponse from rpc call

* refactor to use Instant::now and elapsed for timer

* switch to root bank and ensure not negative subraction

* Add PerfSamples to purge/compaction

* refactor to use Instant::now and elapsed for timer

* switch to root bank and ensure not negative subraction

* remove duplicate constants

Co-authored-by: Tyera Eulberg <tyera@solana.com>
2020-09-22 12:26:32 -07:00
Michael Vines
208dd1de3a Move TestValidator into its own module 2020-09-19 08:35:26 -07:00
behzad nouri
9b866d79fb shards crds values based on their hash prefix (#12187)
filter_crds_values checks every crds filter against every hash value:
https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432
which can be inefficient if the filter's bit-mask only matches small
portion of the entire crds table.

This commit shards crds values into separate tables based on shard_bits
first bits of their hash prefix. Given a (mask, mask_bits) filter,
filtering crds can be done by inspecting only relevant shards.

If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds
values which match (mask, mask_bits) bit pattern are traversed.
If CrdsFilter.mask_bits > shard_bits, then approximately only
1/2^shard_bits of crds values are inspected.

Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in
generate_pull_responses metric, but with larger clusters, crds table and
2^mask_bits are both larger, so the impact should be more significant.
2020-09-17 14:05:16 +00:00
Tyera Eulberg
05db41fe9c Cache block time in Blockstore (#11955)
* Add blockstore column to cache block times

* Add method to cache block time

* Add service to cache block time

* Update rpc getBlockTime to use new method, and refactor blockstore slightly

* Return block_time with confirmed block, if available

* Add measure and warning to cache-block-time
2020-09-09 09:33:14 -06:00
anatoly yakovenko
c67f8bd821 Forward transactions to the expected leader instead of your own TPU port (#12004)
* Use PoHRecorder to send to the right leader

* cleanup

* fmt

* clippy

* Cleanup, fix bug

Co-authored-by: Carl <carl@solana.com>
2020-09-08 17:00:49 +08:00
Michael Vines
bc7731b969 Add BigTableUploadService 2020-09-04 16:01:49 -07:00
carllin
1c1a3f979d Detect and notify when deserializable shreds are available (#11816)
* Add logic to check for complete data ranges

* Add RPC signature notification

Co-authored-by: Carl <carl@solana.com>
2020-09-01 22:06:06 -07:00
Jack May
7c736f71fe Make BPF Loader static (#11516) 2020-08-14 12:32:45 -07:00
carllin
7ef50a9352 Move cluster slots update to separate thread (#11523)
* Add cluster_slots_service

Co-authored-by: Carl <carl@solana.com>
2020-08-11 12:48:13 -07:00
Greg Fitzgerald
edadd5d6d5 Remove Budget from CLI (#11451)
* Remove support for Budget

Also:
* Make "pay" command a deprecated alias for the "transfer" command

* chore: remove budget from web3.js

* Drop Budget depedency from core

Validators no longer ship with builtin Budget
2020-08-07 16:01:51 -06:00
carllin
a7ea340f22 Track votes from gossip for optimistic confirmation (#11209)
* Add check in cluster_info_vote_listenere to see if optimstic conf was achieved
Add OptimisticConfirmationVerifier

* More fixes

* Fix merge conflicts

* Remove gossip notificatin

* Add dashboards

* Fix rebase

* Count switch votes as well toward optimistic conf

* rename

Co-authored-by: Carl <carl@solana.com>
2020-07-28 09:33:27 +00:00
carllin
e9cbdf711b Add TreeDiff trait to reuse tree functions (#11046)
Co-authored-by: Carl <carl@solana.com>
2020-07-14 07:38:48 +00:00
Greg Fitzgerald
16eeea4f82 Move SendTransactionService to solana_runtime (#10972) 2020-07-09 18:28:26 +00:00
carllin
3f6042d8b3 Add RepairWeight to track votes seen in gossip for weighted repair (#10903)
* Add RepairWeight

Co-authored-by: Carl <carl@solana.com>
2020-07-06 22:49:40 -07:00
Ryo Onodera
39b3ac6a8d Introduce automatic ABI maintenance mechanism (2/2; rollout) (#8012)
* Introduce automatic ABI maintenance mechanism (2/2; rollout)

* Fix stable clippy

* Change to symlink

* Freeze abi of Tower

* fmt...

* Improve dev-experience!

* Update BankSlotDelta

$ diff -u /tmp/abi8/*7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj* /tmp/abi8/*9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w*
--- /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj      2020-06-18 18:01:22.831228087 +0900
+++ /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w      2020-07-03 15:59:58.430695244 +0900
@@ -140,7 +140,7 @@
                                                         field u8
                                                             primitive u8
                                                         field solana_sdk::instruction::InstructionError
-                                                            enum InstructionError (variants = 34)
+                                                            enum InstructionError (variants = 35)
                                                                 variant(0) GenericError (unit)
                                                                 variant(1) InvalidArgument (unit)
                                                                 variant(2) InvalidInstructionData (unit)
@@ -176,6 +176,7 @@
                                                                 variant(31) CallDepth (unit)
                                                                 variant(32) MissingAccount (unit)
                                                                 variant(33) ReentrancyNotAllowed (unit)
+                                                                variant(34) MaxSeedLengthExceeded (unit)
                                                     variant(9) CallChainTooDeep (unit)
                                                     variant(10) MissingSignatureForFee (unit)
                                                     variant(11) InvalidAccountIndex (unit)

* Fix some merge conflicts...
2020-07-06 20:22:23 +09:00
carllin
f17ac70bb2 Add weighted traversal (#10877)
Co-authored-by: Carl <carl@solana.com>
2020-07-02 14:33:04 -07:00
Greg Fitzgerald
50b3fa83a0 Move BankCommitmentCache to solana_runtime (#10816)
* Remove Blockstore member variable from BlockCommitmentCache

* Hoist is_confirmed_rooted() to its only caller

BlockCommitmentCache no longer depends on Blockstore

* Move BlockCommitmentCache to solana_runtime
2020-06-25 22:06:58 -06:00
Greg Fitzgerald
8dd5384d6d Split commitment module (#10541)
automerge
2020-06-12 17:16:10 -07:00
carllin
2e1d59ff85 Adopt heaviest subtree fork choice rule (#10441)
* Add HeaviestSubtreeForkChoice

* Make replay stage switch between two fork choice rules

Co-authored-by: Carl <carl@solana.com>
2020-06-11 12:16:04 -07:00
Michael Vines
ed351400b2 Add SendTransactionService 2020-06-09 07:46:40 -07:00
Michael Vines
9dbf3d5430 Factor out RpcHealth module 2020-06-01 17:51:04 -07:00
carllin
97f2bcff69 master: Add nonce to shreds repairs, add shred data size to header (#10109)
* Add nonce to shreds/repairs

* Add data shred size to header

Co-authored-by: Carl <carl@solana.com>
2020-05-19 12:38:18 -07:00