Commit Graph

164 Commits

Author SHA1 Message Date
mergify[bot]
00abcbe1be Introduce slot-specific packet metrics (backport #22906) (#23076)
* Resolve conflicts

* add end_of_slot_filtered_invalid_count

* Increment end_of_slot_filtered_invalid_count

* Fixup comment

* Remove comment

* Move all process_tx metris into common function

* Switch to saturating_add_assign macro

* Refactor timings so each struct reports own timing

* Move into accumulate

* Remove unnecessary struct

Co-authored-by: Carl Lin <carl@solana.com>
2022-02-14 10:23:48 +00:00
mergify[bot]
1570026c2c tracks erasure coding shreds indices explicitly (backport #21822) (#22969)
* tracks erasure coding shreds' indices explicitly (#21822)

The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921

However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.

The commit adds constructs to track coding shreds indices explicitly.

(cherry picked from commit 65d59f4ef0)

# Conflicts:
#	core/benches/retransmit_stage.rs
#	core/benches/shredder.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/broadcast_fake_shreds_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/window_service.rs
#	ledger/src/blockstore.rs
#	ledger/src/shred.rs
#	ledger/tests/shred.rs

* removes mergify merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2022-02-06 02:42:32 +00:00
mergify[bot]
89ec164787 removes next_shred_index from return value of entries to shreds api (backport #21961) (#22965)
* removes next_shred_index from return value of entries to shreds api (#21961)

next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.

(cherry picked from commit 89d66c3210)

# Conflicts:
#	core/benches/shredder.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/broadcast_fake_shreds_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs
#	gossip/src/duplicate_shred.rs
#	ledger/src/blockstore.rs
#	ledger/src/shred.rs
#	ledger/tests/shred.rs

* removes mergify merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2022-02-05 23:05:40 +00:00
mergify[bot]
78b82dedb1 improves sigverify discard_excess_packets performance (backport #22577) (#22579)
* improves sigverify discard_excess_packets performance (#22577)

As shown by the added benchmark, current code does worse if there is a
spam address plus a lot of unique addresses.

on current master:
test bench_packet_discard_many_senders  ... bench:   1,997,960 ns/iter (+/- 103,715)
test bench_packet_discard_mixed_senders ... bench:  14,256,116 ns/iter (+/- 534,865)
test bench_packet_discard_single_sender ... bench:   1,306,809 ns/iter (+/- 61,992)

with this commit:
test bench_packet_discard_many_senders  ... bench:   1,644,025 ns/iter (+/- 83,715)
test bench_packet_discard_mixed_senders ... bench:   1,089,789 ns/iter (+/- 86,324)
test bench_packet_discard_single_sender ... bench:     955,234 ns/iter (+/- 55,953)

(cherry picked from commit dcf44d2523)

# Conflicts:
#	core/benches/sigverify_stage.rs
#	core/src/sigverify_stage.rs

* removes mergify merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2022-01-19 21:23:34 +00:00
Tao Zhu
fc8f61368d Revert "Added vote limits to be 75% of total block limit"
This reverts commit 3e131a5324.
2022-01-18 15:10:49 -07:00
mergify[bot]
81e65eae0b Use VecDeque instead of Vec in sigverify stage (#22538) (#22549)
avoid bad performance of remove(0) for a single sender

(cherry picked from commit 49443406fd)

# Conflicts:
#	core/src/sigverify_stage.rs

Co-authored-by: sakridge <sakridge@gmail.com>
2022-01-17 22:19:25 +00:00
Tao Zhu
3e131a5324 Added vote limits to be 75% of total block limit 2022-01-14 10:49:17 -06:00
mergify[bot]
039244417e removes redundant args from Shredder::try_recovery (backport #21226) (#22126)
* removes redundant args from Shredder::try_recovery (#21226)

Shredder::try_recovery is already taking a Vec<Shred> as an argument. All the
other arguments are embedded in the shreds, and are so redundant.

(cherry picked from commit 5fb0ab9d00)

# Conflicts:
#	ledger/src/shred.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-12-27 21:02:33 +00:00
Tyera Eulberg
a0669af872 Revert "Revert "Rename Packets to PacketBatch (backport #21794) (#21804)""
This reverts commit 13d40d6a66.
2021-12-16 19:28:48 -07:00
Tyera Eulberg
9f53f3455a Revert "Revert "Reformat imports to a consistent style for imports""
This reverts commit d7377d4794.
2021-12-16 19:28:48 -07:00
Trent Nelson
d7377d4794 Revert "Reformat imports to a consistent style for imports"
This reverts commit 139d15cd84.
2021-12-13 12:46:23 -06:00
Trent Nelson
13d40d6a66 Revert "Rename Packets to PacketBatch (backport #21794) (#21804)"
This reverts commit 39e27b130f.
2021-12-13 12:46:23 -06:00
mergify[bot]
39e27b130f Rename Packets to PacketBatch (backport #21794) (#21804)
* Rename Packets to PacketBatch (#21794)

(cherry picked from commit 254ef3e7b6)

# Conflicts:
#	core/src/ancestor_hashes_service.rs
#	core/src/banking_stage.rs
#	core/src/cluster_info_vote_listener.rs
#	core/src/fetch_stage.rs
#	core/src/serve_repair.rs
#	core/src/shred_fetch_stage.rs
#	core/src/sigverify_stage.rs
#	core/src/verified_vote_packets.rs
#	core/src/window_service.rs
#	gossip/src/cluster_info.rs
#	ledger/src/entry.rs
#	streamer/src/streamer.rs

* resolve conflicts

Co-authored-by: Justin Starry <justin@solana.com>
2021-12-11 17:32:17 +00:00
Michael Vines
139d15cd84 Reformat imports to a consistent style for imports
rustfmt.toml configuration:
        imports_granularity = "One"
         group_imports = "One"
2021-12-03 09:41:09 -08:00
mergify[bot]
de1f60fb2d Refactor cost tracker metrics reporting (backport #20802) (#20933)
* - cost_tracker is data member of a bank, it can report metrics when bank is frozen (#20802)

- removed cost_tracker_stats and histogram
- move stats reporting outside of bank freeze

(cherry picked from commit c2bfce90b3)

# Conflicts:
#	Cargo.lock
#	core/src/banking_stage.rs
#	core/src/replay_stage.rs
#	core/src/tvu.rs
#	ledger-tool/src/main.rs
#	programs/bpf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/cost_tracker.rs

* manual fix merge conflicts

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2021-10-27 16:48:20 +00:00
mergify[bot]
28eb6ff796 Invoke cost tracker from its bank (backport #20627) (#20800)
* - make cost_tracker a member of bank, remove shared instance from TPU; (#20627)

- decouple cost_model from cost_tracker; allowing one cost_model
  instance being shared within a validator;
- update cost_model api to calculate_cost(&self...)->transaction_cost

(cherry picked from commit 7496b5784b)

# Conflicts:
#	core/src/banking_stage.rs
#	ledger-tool/src/main.rs
#	runtime/src/bank.rs
#	runtime/src/cost_model.rs
#	runtime/src/cost_tracker.rs

* manual fix merge conflicts

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2021-10-20 00:22:38 +00:00
mergify[bot]
400a88786a aggregate cost_tracker to bank (backport #20527) (#20622)
* - move cost tracker into bank, so each bank has its own cost tracker; (#20527)

- move related modules to runtime

(cherry picked from commit 005d6863fd)

# Conflicts:
#	Cargo.lock
#	core/benches/banking_stage.rs
#	core/src/banking_stage.rs
#	core/src/lib.rs
#	core/src/tvu.rs
#	ledger-tool/src/main.rs
#	ledger/src/blockstore_processor.rs
#	programs/bpf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/cost_model.rs

* manual fix merge conflicts

Co-authored-by: Tao Zhu <82401714+taozhu-chicago@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2021-10-13 05:07:09 +00:00
behzad nouri
c693ecc4c8 fixes backports code changes (#20541) 2021-10-08 15:30:27 +00:00
Tao Zhu
db85d659b9 Cost model 1.7 (#20188)
* Cost Model to limit transactions which are not parallelizeable (#16694)

* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations

* replay stage feed back program cost (#17731)

* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.

* investigate system performance test degradation  (#17919)

* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time

* Persist cost table to blockstore (#18123)

* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.

* log warning when channel send fails (#18391)

* Aggregate cost_model into cost_tracker (#18374)

* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes

* update ledger tool to restore cost table from blockstore (#18489)

* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test

* manually fix merge conflicts

* Per-program id timings (#17554)

* more manual fixing

* solve a merge conflict

* featurize cost model

* more merge fix

* cost model uses compute_unit to replace microsecond as cost unit
(#18934)

* Reject blocks for costs above the max block cost (#18994)

* Update block max cost limit to fix performance regession (#19276)

* replace function with const var for better readability (#19285)

* Add few more metrics data points (#19624)

* periodically report sigverify_stage stats (#19674)

* manual merge

* cost model nits (#18528)

* Accumulate consumed units (#18714)

* tx wide compute budget (#18631)

* more manual merge

* ignore zerorize drop security

* - update const cost values with data collected by #19627
- update cost calculation to closely proposed fee schedule #16984

* add transaction cost histogram metrics (#20350)

* rebase to 1.7.15

* add tx count and thread id to stats (#20451)
each stat reports and resets when slot changes

* remove cost_model feature_set

* ignore vote transactions from cost model

Co-authored-by: sakridge <sakridge@gmail.com>
Co-authored-by: Jeff Biseda <jbiseda@gmail.com>
Co-authored-by: Jack May <jack@solana.com>
2021-10-06 15:55:29 -06:00
sakridge
474f2bcdf4 Prune sigverify queue (#20315) 2021-09-30 05:40:48 +02:00
sakridge
257ddbeee1 Tpu vote 1.7 (#20187)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>
2021-09-29 18:12:58 +02:00
mergify[bot]
47c1730808 uses rayon thread-pool for retransmit-stage parallelization (#19486) (#20293)
(cherry picked from commit 01a7ec8198)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-29 14:11:46 +00:00
mergify[bot]
1bf88556ee removes Slot from TransmitShreds (backport #19327) (#20260)
* removes Slot from TransmitShreds (#19327)

An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.

(cherry picked from commit 1deb4add81)

# Conflicts:
#	core/benches/cluster_info.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-28 12:55:01 +00:00
mergify[bot]
a1a0c63862 retransmits shreds recovered from erasure codes (backport #19233) (#20249)
* removes packet-count metrics from retransmit stage

Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.

(cherry picked from commit bf437b0336)

# Conflicts:
#	core/src/retransmit_stage.rs

* adds packet/shred count stats to window-service

Adding back these metrics from the earlier commit which removed them
from retransmit stage.

(cherry picked from commit 8198a7eae1)

* removes erroneous uses of Arc<...> from retransmit stage

(cherry picked from commit 6e413331b5)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/tvu.rs

* sends shreds (instead of packets) to retransmit stage

Working towards channelling through shreds recovered from erasure codes
to retransmit stage.

(cherry picked from commit 3efccbffab)

# Conflicts:
#	core/src/retransmit_stage.rs

* returns completed-data-set-info from insert_data_shred

instead of opaque (u32, u32) which are then converted to
CompletedDataSetInfo at the call-site.

(cherry picked from commit 3c71670bd9)

# Conflicts:
#	ledger/src/blockstore.rs

* retransmits shreds recovered from erasure codes

Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.

(cherry picked from commit 7a8807b8bb)

# Conflicts:
#	core/src/retransmit_stage.rs
#	core/src/window_service.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-27 18:11:37 +00:00
mergify[bot]
5dd1c2191e shares cluster-nodes between retransmit threads (backport #18947) (#20221)
* shares cluster-nodes between retransmit threads (#18947)

cluster_nodes and last_peer_update are not shared between retransmit
threads, as each thread have its own value:
https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477

Additionally, with shared references, this code:
https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328
has a concurrency bug where the thread which does compare_and_swap,
updates cluster_nodes much later after other threads have run with
outdated cluster_nodes for a while. In particular, the write-lock there
may block.

(cherry picked from commit d06dc6c8a6)

# Conflicts:
#	core/benches/retransmit_stage.rs
#	core/src/retransmit_stage.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-26 16:29:34 +00:00
mergify[bot]
597c504c27 generate deterministic seeds for shreds (backport #17950) (#20172)
* generate deterministic seeds for shreds (#17950)

* generate shred seed from leader pubkey

* clippy

* clippy

* review

* review 2

* fmt

* review

* check

* review

* cleanup

* fmt

(cherry picked from commit a86ced0bac)

# Conflicts:
#	core/benches/cluster_info.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs
#	ledger/src/shred.rs
#	sdk/src/feature_set.rs

* removes backport merge conflicts

Co-authored-by: jbiseda <jbiseda@gmail.com>
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-09-25 19:09:49 +00:00
mergify[bot]
b112e4a8aa windows: Make solana-test-validator work (backport #20099) (#20123)
* windows: Make solana-test-validator work (#20099)

* windows: Make solana-test-validator work

The important changes to get this going on Windows:

* ledger lock needs to be done on a file instead of the directory
* IPC service needs to use the Windows pipe naming scheme
* always disable the JIT
* file logging not possible yet because we can't redirect stderr,
but this will change once env_logger fixes the pipe output target!

* Integrate review feedback

(cherry picked from commit 567f30aa1a)

# Conflicts:
#	validator/src/bin/solana-test-validator.rs
#	validator/src/lib.rs
#	validator/src/main.rs

* Fix merge conflicts

Co-authored-by: Jon Cinque <jon.cinque@gmail.com>
2021-09-24 12:59:12 +00:00
mergify[bot]
aa2098d115 Write helper for multithread update (#18808) (#19282)
Co-authored-by: sakridge <sakridge@gmail.com>
2021-09-02 11:05:15 +00:00
mergify[bot]
eacc69efba adds validator flag to allow private ip addresses (backport #18850) (#18975)
* adds validator flag to allow private ip addresses (#18850)

(cherry picked from commit d2d5f36a3c)

# Conflicts:
#	accounts-cluster-bench/Cargo.toml
#	bench-tps/Cargo.toml
#	cli/Cargo.toml
#	core/benches/cluster_info.rs
#	core/src/banking_stage.rs
#	core/src/broadcast_stage.rs
#	core/src/broadcast_stage/broadcast_duplicates_run.rs
#	core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs
#	core/src/broadcast_stage/standard_broadcast_run.rs
#	core/src/cluster_slots_service.rs
#	core/src/repair_service.rs
#	core/src/tvu.rs
#	core/src/validator.rs
#	dos/Cargo.toml
#	gossip/src/cluster_info.rs
#	gossip/src/crds_gossip_pull.rs
#	gossip/src/crds_gossip_push.rs
#	gossip/src/gossip_service.rs
#	local-cluster/Cargo.toml
#	local-cluster/src/cluster_tests.rs
#	local-cluster/tests/local_cluster.rs
#	rpc/Cargo.toml
#	rpc/src/rpc.rs
#	tokens/Cargo.toml
#	validator/Cargo.toml
#	validator/src/main.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-29 21:43:24 +00:00
mergify[bot]
c534c928a7 encapsulates turbine peers computations of broadcast & retransmit stages (#18238) (#18464)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.

(cherry picked from commit 04787be8b1)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-07-07 14:30:55 +00:00
mergify[bot]
0e7512a225 Fix Nightly Clippy Warnings (backport #18065) (#18070)
* chore: cargo +nightly clippy --fix -Z unstable-options

(cherry picked from commit 6514096a67)

# Conflicts:
#	core/src/banking_stage.rs
#	core/src/cost_model.rs
#	core/src/cost_tracker.rs
#	core/src/execute_cost_table.rs
#	core/src/replay_stage.rs
#	core/src/tvu.rs
#	ledger-tool/src/main.rs
#	programs/bpf_loader/build.rs
#	rbpf-cli/src/main.rs
#	sdk/cargo-build-bpf/src/main.rs
#	sdk/cargo-test-bpf/src/main.rs
#	sdk/src/secp256k1_instruction.rs

* chore: cargo fmt

(cherry picked from commit 789f33e8db)

* Updates BPF program assert_instruction_count tests.

(cherry picked from commit c1e03f3410)

# Conflicts:
#	programs/bpf/tests/programs.rs

* Resolve conflicts

Co-authored-by: Alexander Meißner <AlexanderMeissner@gmx.net>
Co-authored-by: Michael Vines <mvines@gmail.com>
2021-06-18 20:02:48 +00:00
mergify[bot]
ef205593c5 removes port-based forwarding logic from turbine retransmit (#17716) (#17973)
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
https://github.com/solana-labs/solana/issues/6672
https://github.com/solana-labs/solana/pull/7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
https://github.com/solana-labs/solana/pull/17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.

(cherry picked from commit 161838655c)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
2021-06-15 15:16:20 +00:00
mergify[bot]
e247625025 Create solana-poh and move remaining rpc modules to solana-rpc (backport #17698) (#17745)
* Create solana-poh and move remaining rpc modules to solana-rpc (#17698)

* Create solana-poh crate

* Move BigTableUploadService to solana-ledger

* Add solana-rpc to workspace

* Move dependencies to solana-rpc

* Move remaining rpc modules to solana-rpc

* Single use statement solana-poh

* Single use statement solana-rpc

(cherry picked from commit 544b3c0d17)

# Conflicts:
#	Cargo.lock
#	banking-bench/Cargo.toml
#	core/Cargo.toml
#	core/benches/banking_stage.rs
#	local-cluster/Cargo.toml
#	rpc/Cargo.toml
#	stake-monitor/Cargo.toml
#	validator/Cargo.toml

* Fix conflicts & versions

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-06-04 18:19:08 +00:00
Tyera Eulberg
9a5330b7eb Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00
behzad nouri
9d112cf41f encapsulates purged values bookkeeping into crds module (#17265)
For all code paths (gossip push, pull, purge, etc) that remove or
override a crds value, it is necessary to record hash of values purged
from crds table, in order to exclude them from subsequent pull-requests;
otherwise the next pull request will likely return outdated values,
wasting bandwidth:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491

Currently this is done all over the place in multiple modules, and this
has caused bugs in the past where purged values were not recorded.

This commit encapsulated this bookkeeping into crds module, so that any
code path which removes or overrides a crds value, also records the hash
of purged value in-place.
2021-05-24 13:47:21 +00:00
Tyera Eulberg
827355a6b1 Create solana-rpc crate and move subscriptions (#17320)
* Move non_circulating_supply to runtime

* Add solana-rpc crate and move max_slots

* Move subscriptions to solana-rpc

* Single use statements
2021-05-19 00:54:28 -06:00
behzad nouri
1ac2a8cfa5 removes delayed crds inserts when upserting gossip table (#16806)
It is crucial that VersionedCrdsValue::insert_timestamp does not go
backward in time:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79

Otherwise methods such as get_votes and get_epoch_slots_since will
break, which will break their downstream flow, including vote-listener
and optimistic confirmation:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298

For that, Crds::new_versioned is intended to be called "atomically" with
Crds::insert_verioned (as the comment already says so):
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129

However, currently this is violated in the code. For example,
filter_pull_responses creates VersionedCrdsValues (with the current
timestamp), then acquires an exclusive lock on gossip, then
process_pull_responses writes those values to the crds table:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392

Depending on the workload and lock contention, the insert_timestamps may
well be in the past when these values finally are inserted into gossip.

To avoid such scenarios, this commit:
  * removes Crds::new_versioned and Crd::insert_versioned.
  * makes VersionedCrdsValue constructor private, only invoked in
    Crds::insert, so that insert_timestamp is populated right before
    insert.

This will improve insert_timestamp monotonicity as long as Crds::insert
is not called with a stalled timestamp. Following commits may further
improve this by calling timestamp() inside Crds::insert, and/or
switching to std::time::Instant which guarantees monotonicity.
2021-04-28 11:56:13 +00:00
behzad nouri
03194145c0 removes first_coding_index from erasure recovery code (#16646)
first_coding_index is the same as the set_index and is so redundant:
https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/blockstore_meta.rs#L49-L60
2021-04-23 12:00:37 +00:00
behzad nouri
37b8587d4e expands number of erasure coding shreds in the last batch in slots (#16484)
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719

Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714

However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.

This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.

As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
2021-04-21 12:47:50 +00:00
steviez
bb24318ef0 Document shreds (#16514)
No functionality changes from this commit
2021-04-16 14:04:46 -05:00
Justin Starry
85eb37fab0 Merge pull request from GHSA-8v47-8c53-wwrc
* Track transaction check time separately from account loads

* banking packet process metrics

* Remove signature clone in status cache lookup

* Reduce allocations when converting packets to transactions

* Add blake3 hash of transaction messages in status cache

* Bug fixes

* fix tests and run fmt

* Address feedback

* fix simd tx entry verification

* Fix rebase

* Feedback

* clean up

* Add tests

* Remove feature switch and fall back to signature check

* Bump programs/bpf Cargo.lock

* clippy

* nudge benches

* Bump `BankSlotDelta` frozen ABI hash`

* Add blake3 to sdk/programs/Cargo.lock

* nudge bpf tests

* short circuit status cache checks

Co-authored-by: Trent Nelson <trent@solana.com>
2021-04-13 00:28:08 -06:00
behzad nouri
3f63ed9a72 removes OrderedIterator and transaction batch iteration order (#16153)
In TransactionBatch,
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/transaction_batch.rs#L4-L11
lock_results[i] is aligned with transactions[iteration_order[i]]:
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/bank.rs#L2414-L2424
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/accounts.rs#L788-L817

However load_and_execute_transactions is iterating over
  lock_results[iteration_order[i]]
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/bank.rs#L2878-L2889
and then returning i as for the index of the retryable transaction.

If iteratorion_order is [1, 2, 0], and i is 0, then:
  lock_results[iteration_order[i]] = lock_results[1]
which corresponds to
  transactions[iteration_order[1]] = transactions[2]
so neither i = 0, nor iteration_order[i] = 1 gives the correct index for the
corresponding transaction (which is 2).

This commit removes OrderedIterator and transaction batch iteration order
entirely. There is only one place in blockstore processor which the
iteration order is not ordinal:
https://github.com/solana-labs/solana/blob/e50f59844/ledger/src/blockstore_processor.rs#L269-L271
It seems like, instead of using an iteration order, that can shuffle entry
transactions in-place.
2021-03-31 23:59:19 +00:00
behzad nouri
4f82b897bc buffers data shreds to make larger erasure coded sets (#15849)
Broadcast stage batches up to 8 entries:
https://github.com/solana-labs/solana/blob/79280b304/core/src/broadcast_stage/broadcast_utils.rs#L26-L29
which will be serialized into some number of shreds and chunked into FEC
sets of at most 32 shreds each:
https://github.com/solana-labs/solana/blob/79280b304/ledger/src/shred.rs#L576-L597
So depending on the size of entries, FEC sets can be small, which may
aggravate loss rate.
For example 16 FEC sets of 2:2 data/code shreds each have higher loss
rate than one 32:32 set.

This commit broadcasts data shreds immediately, but also buffers them
until it has a batch of 32 data shreds, at which point 32 coding shreds
are generated and broadcasted.
2021-03-23 14:52:38 +00:00
Jeff Washington (jwash)
57ba86c821 eliminate lock on record (#15929)
* eliminate lock on record

* use same error as MaxHeightReached

* clippy

* review feedback

* refactor should_tick code

* pr feedback
2021-03-23 09:10:04 -05:00
carllin
c1ba265dd9 Wallclock BankingStage Throttle (#15731) 2021-03-15 17:11:15 -07:00
sakridge
d09112fa6d PoH batch size calibration (#15717) 2021-03-05 16:01:21 -08:00
sakridge
830be855dc Forward and hold packets (#15634) 2021-03-03 10:23:05 -08:00
carllin
ae96ba3459 Plumb slot update pubsub notifications (#15488) 2021-02-28 23:29:11 -08:00
sakridge
1b59b163dd Add max retransmit and shred insert slot (#15475) 2021-02-23 13:06:33 -08:00
Trent Nelson
7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00