2442 Commits

Author SHA1 Message Date
Tyera Eulberg
e9bef42502 Bump version to v1.7.18 (#21011) 2021-10-26 23:02:09 -06:00
mergify[bot]
e6bd6dd260 Swap banking stage vote channels (#20987) (#20999)
(cherry picked from commit 261dd96ae3)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-10-26 23:14:22 +00:00
mergify[bot]
5f86735382 Add check for shred data header size (#20668) (#20680)
(cherry picked from commit 588168b99d)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-10-25 01:23:11 +00:00
Justin Starry
e3e1396c1d Bump version to v1.7.17 (#20896) 2021-10-23 09:48:15 -04:00
mergify[bot]
c597a6c828 Report timing info for stakes cache updates from txs (backport #20856) (#20883)
* Report timing info for stakes cache updates from txs (#20856)

(cherry picked from commit 735016661b)

# Conflicts:
#	runtime/src/bank.rs

* resolve conflicts

Co-authored-by: Justin Starry <justin@solana.com>
2021-10-22 15:51:20 -04:00
mergify[bot]
a483a7da47 Add counter for dropped duplicated packets, fix dropped_packets_count (#20834) (#20837)
(cherry picked from commit 71d0bd4605)

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
2021-10-21 04:55:11 +00:00
mergify[bot]
e333cd48e8 rpc-send-tx-svc server-side retry knobs (backport #20818) (#20829)
* rpc-send-tx-svc: add with_config constructor

(cherry picked from commit fe098b5ddc)

# Conflicts:
#	Cargo.lock
#	core/Cargo.toml
#	replica-node/Cargo.toml
#	rpc/src/rpc_service.rs
#	rpc/src/send_transaction_service.rs
#	validator/Cargo.toml

* rpc-send-tx-svc: server-side retry knobs

(cherry picked from commit 2744a2128c)

Co-authored-by: Trent Nelson <trent@solana.com>
2021-10-21 00:13:46 +00:00
Michael Vines
779f36a1bd Rework AVX/AVX2 detection again
(cherry picked from commit c16510152e)
2021-10-10 15:28:26 -07:00
Trent Nelson
33c24ec3ae Bump version to 1.7.16 (#20519) 2021-10-07 18:29:40 -06:00
behzad nouri
2d930052dc fixes backports code changes (#20482) 2021-10-07 03:44:30 +00:00
Trent Nelson
e4aecd9320 Revert "Cost model 1.7 (#20188)"
This reverts commit 1dd6dc3709.
2021-10-06 16:25:24 -06:00
Tao Zhu
1dd6dc3709 Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:11:41 -05:00
Justin Starry
d922971ec6 Optimize stakes cache and rewards at epoch boundaries (backport #20432) (#20472)
* Optimize stakes cache and rewards at epoch boundaries (backport #20432)

* fix conflicts
2021-10-06 16:15:27 +00:00
Tyera Eulberg
734b380cdb Bump version to v1.7.15 (#20338) 2021-09-30 10:51:34 -06:00
sakridge
474f2bcdf4 Prune sigverify queue (#20315) 2021-09-30 05:40:48 +02:00
sakridge
fec15f69f4 Increment 1.7 version (#20316) 2021-09-29 15:37:45 -04:00
sakridge
257ddbeee1 Tpu vote 1.7 (#20187)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>
2021-09-29 18:12:58 +02:00
mergify[bot]
47c1730808 uses rayon thread-pool for retransmit-stage parallelization (#19486) (#20293)
(cherry picked from commit 01a7ec8198)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-29 14:11:46 +00:00
mergify[bot]
2f2948f998 Extricate RpcCompletedSlotsService from RetransmitStage (backport #18017) (#20294)
* Extricate RpcCompletedSlotsService from RetransmitStage

(cherry picked from commit fa04531c7a)

# Conflicts:
#	core/src/replay_stage.rs
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs
#	core/src/validator.rs

* removes backport merge conflicts

Co-authored-by: Michael Vines <mvines@gmail.com>
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 18:25:51 +00:00
mergify[bot]
55ccff7604 skips retransmit for shreds with unknown slot leader (backport #19472) (#20291)
* skips retransmit for shreds with unknown slot leader (#19472)

Shreds' signatures should be verified before they reach retransmit
stage, and if the leader is unknown they should fail signature check.
Therefore retransmit-stage can as well expect to know who the slot
leader is and otherwise just skip the shred.

Blockstore checking signature of recovered shreds before sending them to
retransmit stage:
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930

Shred signature verifier:
https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105

(cherry picked from commit 6d9818b8e4)

# Conflicts:
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	ledger/src/shred.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 15:26:30 +00:00
mergify[bot]
1bf88556ee removes Slot from TransmitShreds (backport #19327) (#20260)
* removes Slot from TransmitShreds (#19327)

An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.

(cherry picked from commit 1deb4add81)

# Conflicts:
#	core/benches/cluster_info.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 12:55:01 +00:00
mergify[bot]
a1a0c63862 retransmits shreds recovered from erasure codes (backport #19233) (#20249)
* removes packet-count metrics from retransmit stage

Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.

(cherry picked from commit bf437b0336)

# Conflicts:
#	core/src/retransmit_stage.rs

* adds packet/shred count stats to window-service

Adding back these metrics from the earlier commit which removed them
from retransmit stage.

(cherry picked from commit 8198a7eae1)

* removes erroneous uses of Arc<...> from retransmit stage

(cherry picked from commit 6e413331b5)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs

* sends shreds (instead of packets) to retransmit stage

Working towards channelling through shreds recovered from erasure codes
to retransmit stage.

(cherry picked from commit 3efccbffab)

# Conflicts:
#	core/src/retransmit_stage.rs

* returns completed-data-set-info from insert_data_shred

instead of opaque (u32, u32) which are then converted to
CompletedDataSetInfo at the call-site.

(cherry picked from commit 3c71670bd9)

# Conflicts:
#	ledger/src/blockstore.rs

* retransmits shreds recovered from erasure codes

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.

(cherry picked from commit 7a8807b8bb)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-27 18:11:37 +00:00
mergify[bot]
502ae8b319 removes repeated bank-forks locking in window-service (backport #19210) (#20239)
* removes repeated bank-forks locking in window-service

Window service is repeatedly locking bank-forks to look-up working-bank
for every single shred:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L597-L606

This commit updates shred_filter signature in recv_window so that where
we already obtain the lock on bank-forks, we can also look-up
working-bank once for all packets:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L256-L277

(cherry picked from commit d57398a959)

# Conflicts:
#	core/src/window_service.rs

* removes erroneous uses of &Arc<...> from window-service

(cherry picked from commit b64eeb7729)

# Conflicts:
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-27 13:48:58 +00:00
mergify[bot]
d68377e927 unifies cluster-nodes computation & caching across turbine stages (backport #18971) (#20231)
* sends slots (instead of stakes) through broadcast flow

Current broadcast code is computing stakes for each slot before sending
them down the channel:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

Since the stakes are a function of epoch the slot belongs to (and so
does not necessarily change from one slot to another), forwarding the
slot itself would allow better caching downstream.

In addition we need to invalidate the cache if the epoch changes (which
the current code does not do), and that requires to know which slot (and
so epoch) current broadcasted shreds belong to:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

(cherry picked from commit 44b11154ca)

# Conflicts:
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* implements cluster-nodes cache

Cluster nodes are cached keyed by the respective epoch from which stakes
are obtained, and so if epoch changes cluster-nodes will be recomputed.

A time-to-live eviction policy is enforced to refresh entries in case
gossip contact-infos are updated.

(cherry picked from commit ecc1c7957f)

* uses cluster-nodes cache in retransmit stage

The new cluster-nodes cache will:
  * ensure cluster-nodes are recalculated if the epoch (and so the epoch
    staked nodes) changes.
  * encapsulate time-to-live eviction policy.

(cherry picked from commit 30bec3921e)

* uses cluster-nodes cache in broadcast-stage

* Current caching mechanism does not update cluster-nodes when the epoch
  (and so epoch staked nodes) changes:
  https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

* Additionally, the cache update has a concurrency bug in which the
  thread which does compare_and_swap may be blocked when it tries to
  obtain the write-lock on cache, while other threads will keep running
  ahead with the outdated cache (since the atomic timestamp is already
  updated).

In the new ClusterNodesCache, entries are keyed by epoch, and so if
epoch changes cluster-nodes will be recalculated. The time-to-live
eviction policy is also encapsulated and rigidly enforced.

(cherry picked from commit aa32738dd5)

# Conflicts:
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* unifies cluster-nodes computation & caching across turbine stages

Broadcast-stage is using epoch_staked_nodes based on the same slot that
shreds belong to:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

But retransmit-stage is using bank-epoch of the working-bank:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289

So the two are not consistent at epoch boundaries where some nodes may
have a working bank (or similarly a root bank) lagging other nodes. As a
result the node which obtains a packet may construct turbine broadcast
tree inconsistently with its parent node in the tree and so some packets
may fail to reach all nodes in the tree.

(cherry picked from commit 50d0e830c9)

* adds fallback & metric for when epoch staked-nodes are none

(cherry picked from commit fb69f45f14)

* allows only one thread to update cluster-nodes cache entry for an epoch

If two threads simultaneously call into ClusterNodesCache::get for the
same epoch, and the cache entry is outdated, then both threads recompute
cluster-nodes for the epoch and redundantly overwrite each other.

This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that
when needed only one thread does the computations to update the entry.

(cherry picked from commit eaf927cf49)

* falls back on working-bank if root-bank::epoch-staked-nodes is none

bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).

At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.

To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.

(cherry picked from commit e4be00fece)

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 23:45:42 +00:00
mergify[bot]
e9a993fb59 allows sendmmsg api taking owned values (as well as references) (#18999) (#20226)
Current signature of api in sendmmsg requires a slice of inner
references:
https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152

That forces the call-site to convert owned values to references even
though doing so is redundant and adds an extra level of indirection:
https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291

This commit expands the api using AsRef and Borrow traits to allow
calling the method with owned values (as well as references like
before).

(cherry picked from commit 049fb0417f)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 20:35:13 +00:00
mergify[bot]
0ec301f1c3 improves parallelism in window-service recv_window (backport #18446) (#20142)
* sends packets in batches from sigverify-stage (#18446)

sigverify-stage is breaking batches to single-item vectors before
sending them down the channel:
https://github.com/solana-labs/solana/blob/d451363dc/core/src/sigverify_stage.rs#L88-L92

Also simplifying window-service code, reducing number of nested branches.

(cherry picked from commit 7d56fa8363)

# Conflicts:
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 18:54:07 +00:00
mergify[bot]
34665571fa drop outstanding_requests lock before sending repair requests (backport #18893) (#20227)
* drop outstanding_requests lock before sending repair requests (#18893)

(cherry picked from commit 9255ae334d)

# Conflicts:
#	core/src/repair_service.rs

* removes backport merge conflicts

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 18:39:28 +00:00
mergify[bot]
5dd1c2191e shares cluster-nodes between retransmit threads (backport #18947) (#20221)
* shares cluster-nodes between retransmit threads (#18947)

cluster_nodes and last_peer_update are not shared between retransmit
threads, as each thread have its own value:
https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477

Additionally, with shared references, this code:
https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328
has a concurrency bug where the thread which does compare_and_swap,
updates cluster_nodes much later after other threads have run with
outdated cluster_nodes for a while. In particular, the write-lock there
may block.

(cherry picked from commit d06dc6c8a6)

# Conflicts:
#	core/benches/retransmit_stage.rs
#	core/src/retransmit_stage.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 16:29:34 +00:00
mergify[bot]
aacb5e58ad sendmmsg cleanup #18589 (#20175)
Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.

(cherry picked from commit ae5ad5cf9b)

Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
2021-09-25 23:00:00 +00:00
mergify[bot]
597c504c27 generate deterministic seeds for shreds (backport #17950) (#20172)
* generate deterministic seeds for shreds (#17950)

* generate shred seed from leader pubkey

* clippy

* clippy

* review

* review 2

* fmt

* review

* check

* review

* cleanup

* fmt

(cherry picked from commit a86ced0bac)

# Conflicts:
#	core/benches/cluster_info.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs
#	ledger/src/shred.rs
#	sdk/src/feature_set.rs

* removes backport merge conflicts

Co-authored-by: jbiseda <jbiseda@gmail.com>
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-25 19:09:49 +00:00
mergify[bot]
99c74c8902 Limit transaction forwarding from banking_stage (#19940) (#20081)
(cherry picked from commit 013e1d9d49)

Co-authored-by: sakridge <sakridge@gmail.com>
2021-09-24 18:04:22 +00:00
mergify[bot]
b112e4a8aa windows: Make solana-test-validator work (backport #20099) (#20123)
* windows: Make solana-test-validator work (#20099)

* windows: Make solana-test-validator work

The important changes to get this going on Windows:

* ledger lock needs to be done on a file instead of the directory
* IPC service needs to use the Windows pipe naming scheme
* always disable the JIT
* file logging not possible yet because we can't redirect stderr,
but this will change once env_logger fixes the pipe output target!

* Integrate review feedback

(cherry picked from commit 567f30aa1a)

# Conflicts:
#	validator/src/bin/solana-test-validator.rs
#	validator/src/lib.rs
#	validator/src/main.rs

* Fix merge conflicts

Co-authored-by: Jon Cinque <jon.cinque@gmail.com>
2021-09-24 12:59:12 +00:00
carllin
49a6acffca Add metric for duplicate retransmit (#20129) 2021-09-22 22:51:44 -07:00
mergify[bot]
03da3eaa81 Optimize RPC pubsub for multiple clients with the same subscription (backport #18943) (#19987)
* Optimize RPC pubsub for multiple clients with the same subscription (#18943)

* reimplement rpc pubsub with a broadcast queue

* update tests for new pubsub implementation

* fix: fix review suggestions

* chore(rpc): add additional pubsub metrics

* integrate max subscriptions check into SubscriptionTracker to reduce locking

* separate subscription control from tracker

* limit memory usage of items in pubsub broadcast queue, improve error handling

* add more pubsub metrics

* add final count metrics to pubsub

* add metric for total number of subscriptions

* fix small review suggestions

* remove by_params from SubscriptionTracker and add node_progress_watchers map instead

* add subscription tracker tests

* add metrics for number of pubsub notifications as a counter

* ignore clippy lint in TokenCounter

* fix underflow in token counter

* reduce queue capacity in pubsub tests

* fix(rpc): fix test timeouts

* fix race in account subscription test

* Add RpcSubscriptions::new_for_tests

Co-authored-by: Pavel Strakhov <p.strakhov@iconic.vc>
Co-authored-by: Nikita Podoliako <n.podoliako@zubr.io>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
(cherry picked from commit 65227f44dc)

# Conflicts:
#	Cargo.lock
#	core/Cargo.toml
#	core/src/replay_stage.rs
#	core/src/validator.rs
#	replica-node/src/replica_node.rs
#	rpc/Cargo.toml

* Fix conflicts (and standardize naming to make future subscription backports easier

Co-authored-by: Pavel Strakhov <ri@idzaaus.org>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-09-20 06:00:08 +00:00
mergify[bot]
53b387f113 Report consumed_buffered_packets_count stat to metrics (backport #19900) (#19915)
* Report consumed_buffered_packets_count stat to metrics (#19900)

(cherry picked from commit 34c1a9ac85)

# Conflicts:
#	core/src/banking_stage.rs

* fix conflicts

Co-authored-by: Justin Starry <justin@solana.com>
2021-09-18 20:57:28 +00:00
sakridge
70d556782b Bump 1.7 version (#19943) 2021-09-16 13:16:09 -06:00
sakridge
ca83167cfc Only allow votes when root distance gets too high (#19933) 2021-09-16 16:48:51 +02:00
mergify[bot]
80e239550b Add banking metrics for buffered and dropped packets (#19902) (#19927)
(cherry picked from commit ca3f147670)

Co-authored-by: Justin Starry <justin@solana.com>
2021-09-16 00:59:27 +00:00
Michael
e51c2d1a84 Add an info log to indicate the node has reached supermajority and print the active stake percentage (#19893)
(cherry picked from commit 4ff50519ff)
2021-09-15 01:59:01 -07:00
Tyera Eulberg
a24b0dc81c Use f64 for stake math in get_stake_percent_in_gossip (#19895)
(cherry picked from commit c91519961c)
2021-09-15 01:58:52 -07:00
behzad nouri
7e5026bde2 removes backport merge conflicts 2021-09-09 23:33:29 -07:00
behzad nouri
7ac0ea0885 filters for recent contact-infos when checking for live stake (#19204)
Contact-infos are saved to disk:
https://github.com/solana-labs/solana/blob/9dfeee299/gossip/src/cluster_info.rs#L1678-L1683

and restored on validator start-up:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L450

Staked nodes entries will not expire until an epoch after. So when the
validator checks for online stake it is erroneously picking up
contact-infos restored from disk, which breaks the entire
wait-for-supermajority logic:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L1515-L1561

This commit adds an extra check for the age of contact-info entries and
filters out old ones.

(cherry picked from commit 7a789e0763)

# Conflicts:
#	core/src/validator.rs
2021-09-09 23:33:29 -07:00
Michael Vines
dc06d3dee5 Reduce wait for supermajority threshold back to 80%
(cherry picked from commit 4386e09710)
2021-09-09 21:45:00 -07:00
mergify[bot]
aa2098d115 Write helper for multithread update (#18808) (#19282)
Co-authored-by: sakridge <sakridge@gmail.com>
2021-09-02 11:05:15 +00:00
mergify[bot]
744a69f818 Fix shreds-to-hours/days estimations (#19477) (#19503)
(cherry picked from commit a3bef2e537)

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-08-30 22:34:13 +00:00
Tyera Eulberg
f73a61d2ec Bump version to 1.7.12 2021-08-27 16:24:24 +00:00
mergify[bot]
52dfb4a09c Bump jsonrpc crates and remove old tokio (backport #18779) (#19453)
* Bump jsonrpc crates and remove old tokio (#18779)

* Bump jsonrpc crates and replace old tokio

* Bump tokio

* getBlockTime

* getBlocks

* getBlocksWithLimit, getInflationReward

* getBlock

* getFirstAvailableBlock

* getTransaction

* getSignaturesForAddress

* getSignatureStatuses

* Remove superfluous runtime

(cherry picked from commit 8596db8f53)

# Conflicts:
#	Cargo.lock
#	client/Cargo.toml
#	core/Cargo.toml
#	programs/bpf/Cargo.lock
#	rpc/Cargo.toml
#	rpc/src/rpc.rs
#	validator/Cargo.toml

* Fix conflicts

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-08-27 00:55:02 +00:00
mergify[bot]
caea9c99cd Minimize trust (backport #19279) (#19335)
* docs: Mainnet Beta inflation has been enabled for quite some time

(cherry picked from commit 169ded9a70)

* validator: Trusted validators are now called known validators

(cherry picked from commit e0bc5fa690)

* docs: trust minimize

(cherry picked from commit 40613161a0)

* docs: correct known validator operator

(cherry picked from commit eced50d103)

* docs: Remove decommissioned testnet archetype validator

(cherry picked from commit a587eec20b)

* docs: update devnet start args with new validators

(cherry picked from commit 2a877ae06e)

Co-authored-by: Trent Nelson <trent@solana.com>
2021-08-20 06:24:43 +00:00
mergify[bot]
1495b94f7a discards epoch-slots epochs ahead of the current root (#19256) (#19328)
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
https://github.com/solana-labs/solana/issues/17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
https://github.com/solana-labs/solana/issues/17789#issuecomment-896304969

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.

(cherry picked from commit 563aec0b4d)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-08-19 23:35:34 +00:00
mergify[bot]
f55bb78307 updates cluster-slots with root-bank instead of root-slot + bank-forks (backport #19058) (#19324)
* removes unused code from cluster-slots

(cherry picked from commit 2fc112edcf)

# Conflicts:
#	core/src/cluster_slots.rs

* updates cluster-slots with root-bank instead of root-slot + bank-forks

ClusterSlots::update is taking both root-slot and bank-forks only to
later lookup root-bank from bank-forks, which is redundant. Also
potentially by the time bank-forks is locked to obtain root-bank,
root-slot may have already changed and so be inconsistent with the
root-slot passed in as the argument.
https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L32-L39
https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L122

(cherry picked from commit 40914de811)

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-08-19 20:52:41 +00:00