Commit Graph

2468 Commits

Author SHA1 Message Date
Michael Vines
008139f506 Bump version to v1.8.7 2021-12-01 11:44:18 -08:00
mergify[bot]
b030d4be7c Add bank drop service (#21322) (#21360)
(cherry picked from commit 0bda0c3e0c)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-11-24 16:58:42 +00:00
Tyera Eulberg
336ee01aae Bump version to v1.8.6 (#21329) 2021-11-17 17:16:20 -07:00
mergify[bot]
064cce41f7 add --no-os-network-stats-reporting option (backport #21296) (#21303)
* add --no-os-network-stats-reporting option (#21296)

(cherry picked from commit d5de0c8e12)

# Conflicts:
#	core/src/system_monitor_service.rs
#	ledger-tool/src/main.rs
#	validator/src/main.rs

* resolve merge conflicts

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
2021-11-16 23:00:54 +00:00
sakridge
80c3591391 Bump version to 1.8.5 (#21295) 2021-11-16 11:58:57 -07:00
mergify[bot]
0e6b476cbf More set_root metrics (backport #21286) (#21290)
* More set_root metrics (#21286)

(cherry picked from commit 398af132a5)

# Conflicts:
#	core/src/replay_stage.rs

* Fix conflict

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-11-16 02:29:33 +00:00
Ivan Mironov
8072635967 Add validator options to change priority of snapshot packager and RPC threads (v1.8) (#21020)
* Add function for changing thread's nice value

Linux only.

* Add validator option to change niceness of snapshot packager thread

* Add validator option to change niceness of RPC server threads

Fixes https://github.com/solana-labs/solana/issues/14556

* Run `./scripts/cargo-for-all-lock-files.sh tree`
2021-11-15 09:50:45 -08:00
mergify[bot]
a86fdebb3b Bump timeout on test_rpc_subscriptions (#21259) (#21262)
* Extend timeout

* Add log

(cherry picked from commit 156b8ead42)

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-11-13 14:23:48 +00:00
mergify[bot]
21cd423e67 Rename "trusted" to "known" in validators/ (backport #21197) (#21255)
* Rename "trusted" to "known" in `validators/` (#21197)

* Replaced trusted with known validator

* Format Convention

(cherry picked from commit b0ca335463)

# Conflicts:
#	core/src/accounts_hash_verifier.rs
#	core/src/serve_repair.rs
#	rpc/src/rpc_service.rs
#	validator/src/bootstrap.rs

* Fix conflicts

Co-authored-by: Michael Keleti <16996410+mkeleti@users.noreply.github.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-11-12 21:36:53 +00:00
Tyera Eulberg
09ef4d12f7 Bump version to 1.8.4 (#21232) (#21235)
Co-authored-by: sakridge <sakridge@gmail.com>
2021-11-10 15:20:41 -07:00
Tyera Eulberg
74684a107c Revert "Bump version to 1.8.4 (#21232)"
This reverts commit 19b3ba0442.
2021-11-10 11:19:39 -07:00
sakridge
19b3ba0442 Bump version to 1.8.4 (#21232) 2021-11-10 16:38:44 +01:00
mergify[bot]
56fc58a2b5 Simplify replay vote tracking by using packet metadata (backport #21112) (#21149)
* Simplify replay vote tracking by using packet metadata (#21112)

(cherry picked from commit 140a5f633d)

# Conflicts:
#	core/src/banking_stage.rs
#	ledger-tool/src/main.rs
#	rpc/src/rpc.rs
#	runtime/src/bank.rs
#	runtime/src/bank_utils.rs
#	runtime/src/stakes.rs
#	sdk/src/transaction/sanitized.rs

* resolve conflicts

Co-authored-by: Justin Starry <justin@solana.com>
2021-11-05 00:16:34 +00:00
mergify[bot]
506d39ea82 Add missing websocket methods to rust RPC PubSub client (backport #21065) (#21073)
* Add missing websocket methods to rust RPC PubSub client (#21065)

- Added accountSubscribe,  programSubscribe, slotSubscribe and rootSubscribe to rust RpcClient
 - Removed duplication on cleanup threads
 - Moved RPCVote from rpc/ to client/rpc_response

(cherry picked from commit a0f9e0e8ee)

# Conflicts:
#	Cargo.lock
#	client-test/Cargo.toml
#	core/tests/client.rs

* Fix conflicts

* Make test result not depend on TestValidator setup

Co-authored-by: Manuel Gil <manugildev@gmail.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-10-29 21:46:59 +00:00
Tyera Eulberg
8dd3c1ece1 Bump version to v1.8.3 (#21040) 2021-10-28 11:17:11 -06:00
mergify[bot]
a1f1264962 Swap banking stage vote channels (#20987) (#21000)
(cherry picked from commit 261dd96ae3)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-10-27 20:17:38 +00:00
mergify[bot]
de1f60fb2d Refactor cost tracker metrics reporting (backport #20802) (#20933)
* - cost_tracker is data member of a bank, it can report metrics when bank is frozen (#20802)

- removed cost_tracker_stats and histogram
- move stats reporting outside of bank freeze

(cherry picked from commit c2bfce90b3)

# Conflicts:
#	Cargo.lock
#	core/src/banking_stage.rs
#	core/src/replay_stage.rs
#	core/src/tvu.rs
#	ledger-tool/src/main.rs
#	programs/bpf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/cost_tracker.rs

* manual fix merge conflicts

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2021-10-27 16:48:20 +00:00
Tao Zhu
7528016e2d Add counter for dropped duplicated packets, fix dropped_packets_count (#20834) (#21023)
(cherry picked from commit 71d0bd4605)
2021-10-27 11:36:37 -05:00
mergify[bot]
7e7f8ef5f0 Report timing info for stakes cache updates from txs (backport #20856) (#20884)
* Report timing info for stakes cache updates from txs (#20856)

(cherry picked from commit 735016661b)

# Conflicts:
#	runtime/src/bank.rs

* resolve conflicts

Co-authored-by: Justin Starry <justin@solana.com>
2021-10-26 20:04:32 +00:00
mergify[bot]
8986bd301c adds metrics tracking gossip crds writes and votes (backport #20953) (#20982)
* adds metrics tracking crds writes and votes (#20953)

(cherry picked from commit 1297a13586)

# Conflicts:
#	core/src/cluster_nodes.rs
#	gossip/benches/crds_shards.rs
#	gossip/src/cluster_info.rs
#	gossip/src/cluster_info_metrics.rs
#	gossip/src/crds_entry.rs
#	gossip/src/crds_gossip.rs
#	gossip/src/crds_gossip_pull.rs
#	gossip/src/crds_gossip_push.rs
#	gossip/src/crds_shards.rs
#	gossip/tests/crds_gossip.rs
#	rpc/src/rpc_service.rs

* updates itertools version in gossip

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-10-26 17:41:45 +00:00
behzad nouri
df6063a622 removes backport merge conflicts 2021-10-24 21:29:29 -07:00
behzad nouri
55a1f03eee adds metrics for number of outgoing shreds in retransmit stage (#20882)
(cherry picked from commit 5e1cf39c74)

# Conflicts:
#	core/src/retransmit_stage.rs
2021-10-24 21:29:29 -07:00
sakridge
d20cccc26b Add check for shred data header size (#20668)
(cherry picked from commit 588168b99d)
2021-10-24 20:16:41 -07:00
Trent Nelson
23b6ce7980 Bump version to 1.8.2 2021-10-21 00:43:40 -06:00
mergify[bot]
8cba6cca76 rpc-send-tx-svc server-side retry knobs (backport #20818) (#20830)
* rpc-send-tx-svc: add with_config constructor

(cherry picked from commit fe098b5ddc)

# Conflicts:
#	Cargo.lock
#	core/Cargo.toml
#	replica-node/Cargo.toml
#	rpc/src/rpc_service.rs
#	rpc/src/send_transaction_service.rs
#	validator/Cargo.toml

* rpc-send-tx-svc: server-side retry knobs

(cherry picked from commit 2744a2128c)

Co-authored-by: Trent Nelson <trent@solana.com>
2021-10-21 02:15:03 +00:00
mergify[bot]
436ec212f4 report udp stats from validator (backport #20587) (#20799)
* report udp stats from validator (#20587)

(cherry picked from commit 4cac66244d)

# Conflicts:
#	core/src/validator.rs

* resolve merge conflicts

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
2021-10-20 00:57:38 +00:00
mergify[bot]
28eb6ff796 Invoke cost tracker from its bank (backport #20627) (#20800)
* - make cost_tracker a member of bank, remove shared instance from TPU; (#20627)

- decouple cost_model from cost_tracker; allowing one cost_model
  instance being shared within a validator;
- update cost_model api to calculate_cost(&self...)->transaction_cost

(cherry picked from commit 7496b5784b)

# Conflicts:
#	core/src/banking_stage.rs
#	ledger-tool/src/main.rs
#	runtime/src/bank.rs
#	runtime/src/cost_model.rs
#	runtime/src/cost_tracker.rs

* manual fix merge conflicts

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2021-10-20 00:22:38 +00:00
mergify[bot]
de32ab4d57 Separate out interrupted slots broadcast metrics (#20537) (#20798)
(cherry picked from commit 838ff3b871)

Co-authored-by: carllin <carl@solana.com>
2021-10-19 22:26:49 +00:00
Sean Young
c8f6a0817b verify_precompiles needs FeatureSet
Rather than pass in individual features, pass in the entire feature set
so that we can add the ed25519 program feature in a later commit.

(cherry picked from commit 0f62771f42)

 Conflicts:
	banks-server/src/banks_server.rs
	core/src/banking_stage.rs
	programs/secp256k1/src/lib.rs
	rpc/src/rpc.rs
	runtime/src/bank.rs
	sdk/src/transaction.rs
	sdk/src/transaction/sanitized.rs
2021-10-18 15:41:24 +01:00
mergify[bot]
eaa6d1a4b5 adds counters for errors in window-service run_insert (#20670) (#20724)
(cherry picked from commit 0f03971c3c)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-10-15 18:09:11 +00:00
mergify[bot]
400a88786a aggregate cost_tracker to bank (backport #20527) (#20622)
* - move cost tracker into bank, so each bank has its own cost tracker; (#20527)

- move related modules to runtime

(cherry picked from commit 005d6863fd)

# Conflicts:
#	Cargo.lock
#	core/benches/banking_stage.rs
#	core/src/banking_stage.rs
#	core/src/lib.rs
#	core/src/tvu.rs
#	ledger-tool/src/main.rs
#	ledger/src/blockstore_processor.rs
#	programs/bpf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/cost_model.rs

* manual fix merge conflicts

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2021-10-13 05:07:09 +00:00
Michael Vines
50803c3f58 Rework AVX/AVX2 detection again
(cherry picked from commit c16510152e)
2021-10-10 15:28:14 -07:00
Lijun Wang
50cb612ae1 Accountsdb stream plugin improvement (#20419) (#20573)
* Accountsdb stream plugin improvement (#20419)

Support using connection pooling and use multiple threads to do Postgres db operations. The performance is improved from 1500 RPS to 40,000 RPS measured during validator start.

Support multiple plugins at the same time.

* Fixed a fmt issue
2021-10-10 15:24:12 -07:00
behzad nouri
c693ecc4c8 fixes backports code changes (#20541) 2021-10-08 15:30:27 +00:00
Lijun Wang
7d0494fcaa Merge AccountsDb plugin framework to v1.8 (#20518)
Merge AccountsDb plugin framework to v1.8 (#20518)
Summary of Changes

Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows

Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.

The code comprises 4 major parts:

accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
2021-10-07 14:15:05 -07:00
Tao Zhu
348ba57b12 Bump version to 1.8.1 2021-10-06 17:57:06 -07:00
Tao Zhu
db85d659b9 Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:55:29 -06:00
Trent Nelson
a4df784e82 Bump version to 1.8.0 2021-10-06 15:48:23 -06:00
Justin Starry
d922971ec6 Optimize stakes cache and rewards at epoch boundaries (backport #20432) (#20472)
* Optimize stakes cache and rewards at epoch boundaries (backport #20432)

* fix conflicts
2021-10-06 16:15:27 +00:00
Tyera Eulberg
734b380cdb Bump version to v1.7.15 (#20338) 2021-09-30 10:51:34 -06:00
sakridge
474f2bcdf4 Prune sigverify queue (#20315) 2021-09-30 05:40:48 +02:00
sakridge
fec15f69f4 Increment 1.7 version (#20316) 2021-09-29 15:37:45 -04:00
sakridge
257ddbeee1 Tpu vote 1.7 (#20187)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>
2021-09-29 18:12:58 +02:00
mergify[bot]
47c1730808 uses rayon thread-pool for retransmit-stage parallelization (#19486) (#20293)
(cherry picked from commit 01a7ec8198)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-29 14:11:46 +00:00
mergify[bot]
2f2948f998 Extricate RpcCompletedSlotsService from RetransmitStage (backport #18017) (#20294)
* Extricate RpcCompletedSlotsService from RetransmitStage

(cherry picked from commit fa04531c7a)

# Conflicts:
#	core/src/replay_stage.rs
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs
#	core/src/validator.rs

* removes backport merge conflicts

Co-authored-by: Michael Vines <mvines@gmail.com>
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 18:25:51 +00:00
mergify[bot]
55ccff7604 skips retransmit for shreds with unknown slot leader (backport #19472) (#20291)
* skips retransmit for shreds with unknown slot leader (#19472)

Shreds' signatures should be verified before they reach retransmit
stage, and if the leader is unknown they should fail signature check.
Therefore retransmit-stage can as well expect to know who the slot
leader is and otherwise just skip the shred.

Blockstore checking signature of recovered shreds before sending them to
retransmit stage:
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930

Shred signature verifier:
https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105

(cherry picked from commit 6d9818b8e4)

# Conflicts:
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	ledger/src/shred.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 15:26:30 +00:00
mergify[bot]
1bf88556ee removes Slot from TransmitShreds (backport #19327) (#20260)
* removes Slot from TransmitShreds (#19327)

An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.

(cherry picked from commit 1deb4add81)

# Conflicts:
#	core/benches/cluster_info.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 12:55:01 +00:00
mergify[bot]
a1a0c63862 retransmits shreds recovered from erasure codes (backport #19233) (#20249)
* removes packet-count metrics from retransmit stage

Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.

(cherry picked from commit bf437b0336)

# Conflicts:
#	core/src/retransmit_stage.rs

* adds packet/shred count stats to window-service

Adding back these metrics from the earlier commit which removed them
from retransmit stage.

(cherry picked from commit 8198a7eae1)

* removes erroneous uses of Arc<...> from retransmit stage

(cherry picked from commit 6e413331b5)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs

* sends shreds (instead of packets) to retransmit stage

Working towards channelling through shreds recovered from erasure codes
to retransmit stage.

(cherry picked from commit 3efccbffab)

# Conflicts:
#	core/src/retransmit_stage.rs

* returns completed-data-set-info from insert_data_shred

instead of opaque (u32, u32) which are then converted to
CompletedDataSetInfo at the call-site.

(cherry picked from commit 3c71670bd9)

# Conflicts:
#	ledger/src/blockstore.rs

* retransmits shreds recovered from erasure codes

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.

(cherry picked from commit 7a8807b8bb)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-27 18:11:37 +00:00
mergify[bot]
502ae8b319 removes repeated bank-forks locking in window-service (backport #19210) (#20239)
* removes repeated bank-forks locking in window-service

Window service is repeatedly locking bank-forks to look-up working-bank
for every single shred:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L597-L606

This commit updates shred_filter signature in recv_window so that where
we already obtain the lock on bank-forks, we can also look-up
working-bank once for all packets:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L256-L277

(cherry picked from commit d57398a959)

# Conflicts:
#	core/src/window_service.rs

* removes erroneous uses of &Arc<...> from window-service

(cherry picked from commit b64eeb7729)

# Conflicts:
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-27 13:48:58 +00:00
mergify[bot]
d68377e927 unifies cluster-nodes computation & caching across turbine stages (backport #18971) (#20231)
* sends slots (instead of stakes) through broadcast flow

Current broadcast code is computing stakes for each slot before sending
them down the channel:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

Since the stakes are a function of epoch the slot belongs to (and so
does not necessarily change from one slot to another), forwarding the
slot itself would allow better caching downstream.

In addition we need to invalidate the cache if the epoch changes (which
the current code does not do), and that requires to know which slot (and
so epoch) current broadcasted shreds belong to:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

(cherry picked from commit 44b11154ca)

# Conflicts:
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* implements cluster-nodes cache

Cluster nodes are cached keyed by the respective epoch from which stakes
are obtained, and so if epoch changes cluster-nodes will be recomputed.

A time-to-live eviction policy is enforced to refresh entries in case
gossip contact-infos are updated.

(cherry picked from commit ecc1c7957f)

* uses cluster-nodes cache in retransmit stage

The new cluster-nodes cache will:
  * ensure cluster-nodes are recalculated if the epoch (and so the epoch
    staked nodes) changes.
  * encapsulate time-to-live eviction policy.

(cherry picked from commit 30bec3921e)

* uses cluster-nodes cache in broadcast-stage

* Current caching mechanism does not update cluster-nodes when the epoch
  (and so epoch staked nodes) changes:
  https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

* Additionally, the cache update has a concurrency bug in which the
  thread which does compare_and_swap may be blocked when it tries to
  obtain the write-lock on cache, while other threads will keep running
  ahead with the outdated cache (since the atomic timestamp is already
  updated).

In the new ClusterNodesCache, entries are keyed by epoch, and so if
epoch changes cluster-nodes will be recalculated. The time-to-live
eviction policy is also encapsulated and rigidly enforced.

(cherry picked from commit aa32738dd5)

# Conflicts:
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* unifies cluster-nodes computation & caching across turbine stages

Broadcast-stage is using epoch_staked_nodes based on the same slot that
shreds belong to:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

But retransmit-stage is using bank-epoch of the working-bank:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289

So the two are not consistent at epoch boundaries where some nodes may
have a working bank (or similarly a root bank) lagging other nodes. As a
result the node which obtains a packet may construct turbine broadcast
tree inconsistently with its parent node in the tree and so some packets
may fail to reach all nodes in the tree.

(cherry picked from commit 50d0e830c9)

* adds fallback & metric for when epoch staked-nodes are none

(cherry picked from commit fb69f45f14)

* allows only one thread to update cluster-nodes cache entry for an epoch

If two threads simultaneously call into ClusterNodesCache::get for the
same epoch, and the cache entry is outdated, then both threads recompute
cluster-nodes for the epoch and redundantly overwrite each other.

This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that
when needed only one thread does the computations to update the entry.

(cherry picked from commit eaf927cf49)

* falls back on working-bank if root-bank::epoch-staked-nodes is none

bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).

At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.

To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.

(cherry picked from commit e4be00fece)

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 23:45:42 +00:00