solana

Author	SHA1	Message	Date
behzad nouri	2d930052dc	fixes backports code changes (#20482 )	2021-10-07 03:44:30 +00:00
Trent Nelson	e4aecd9320	Revert "Cost model 1.7 (#20188 )" This reverts commit `1dd6dc3709`.	2021-10-06 16:25:24 -06:00
Tao Zhu	1dd6dc3709	Cost model 1.7 (#20188 ) * Cost Model to limit transactions which are not parallelizeable (#16694) * * Add following to banking_stage: 1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions. 2. CostTracker which is shared between threads, tracks transaction costs for each block. * replace hard coded program ID with id() calls * Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed. * Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table. * add test for cost_tracker atomically try_add operation, serves as safety guard for future changes * check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker; * bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations * replay stage feed back program cost (#17731) * replay stage feeds back realtime per-program execution cost to cost model; * program cost execution table is initialized into empty table, no longer populated with hardcoded numbers; * changed cost unit to microsecond, using value collected from mainnet; * add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs. * investigate system performance test degradation (#17919) * Add stats and counter around cost model ops, mainly: - calculate transaction cost - check transaction can fit in a block - update block cost tracker after transactions are added to block - replay_stage to update/insert execution cost to table * Change mutex on cost_tracker to RwLock * removed cloning cost_tracker for local use, as the metrics show clone is very expensive. * acquire and hold locks for block of TXs, instead of acquire and release per transaction; * remove redundant would_fit check from cost_tracker update execution path * refactor cost checking with less frequent lock acquiring * avoid many Transaction_cost heap allocation when calculate cost, which is in the hot path - executed per transaction. * create hashmap with new_capacity to reduce runtime heap realloc. * code review changes: categorize stats, replace explicit drop calls, concisely initiate to default * address potential deadlock by acquiring locks one at time * Persist cost table to blockstore (#18123) * Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks * Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()` * Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time * Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory * Only try to persist to blockstore when cost_table is changed. * Restore cost table during validator startup * Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads; * Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model. * log warning when channel send fails (#18391) * Aggregate cost_model into cost_tracker (#18374) * * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions * review fixes * update ledger tool to restore cost table from blockstore (#18489) * update ledger tool to restore cost model from blockstore when compute-slot-cost * Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool * refactor and simplify a test * manually fix merge conflicts * Per-program id timings (#17554) * more manual fixing * solve a merge conflict * featurize cost model * more merge fix * cost model uses compute_unit to replace microsecond as cost unit (#18934) * Reject blocks for costs above the max block cost (#18994) * Update block max cost limit to fix performance regession (#19276) * replace function with const var for better readability (#19285) * Add few more metrics data points (#19624) * periodically report sigverify_stage stats (#19674) * manual merge * cost model nits (#18528) * Accumulate consumed units (#18714) * tx wide compute budget (#18631) * more manual merge * ignore zerorize drop security * - update const cost values with data collected by #19627 - update cost calculation to closely proposed fee schedule #16984 * add transaction cost histogram metrics (#20350) * rebase to 1.7.15 * add tx count and thread id to stats (#20451) each stat reports and resets when slot changes * remove cost_model feature_set * ignore vote transactions from cost model Co-authored-by: sakridge <sakridge@gmail.com> Co-authored-by: Jeff Biseda <jbiseda@gmail.com> Co-authored-by: Jack May <jack@solana.com>	2021-10-06 15:11:41 -05:00
sakridge	474f2bcdf4	Prune sigverify queue (#20315 )	2021-09-30 05:40:48 +02:00
sakridge	257ddbeee1	Tpu vote 1.7 (#20187 ) * Add separate vote processing tpu port * Add feature to send to tpu vote port * Add vote rejecting sigverify mode * use packet.meta.is_simple_vote_tx in place of deserialization * consolidate code that identifies vote tx atcommon path for cpu and gpu * new key for feature set * banking forward tpu vote * add tpu vote port to dockerfile and other review changes * Simplify thread id compare * fix a test; updated cluster_info ABI change Co-authored-by: Tao Zhu <tao@solana.com>	2021-09-29 18:12:58 +02:00
mergify[bot]	47c1730808	uses rayon thread-pool for retransmit-stage parallelization (#19486 ) (#20293 ) (cherry picked from commit `01a7ec8198`) Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-09-29 14:11:46 +00:00
mergify[bot]	1bf88556ee	removes Slot from TransmitShreds (backport #19327 ) (#20260 ) * removes Slot from TransmitShreds (#19327) An earlier version of the code was funneling through stakes along with shreds to broadcast: https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127 This was changed to only slots as stakes computation was pushed further down the pipeline in: https://github.com/solana-labs/solana/pull/18971 However shreds themselves embody which slot they belong to. So pairing them with slot is redundant and adds rooms for bugs should they become inconsistent. (cherry picked from commit `1deb4add81`) # Conflicts: # core/benches/cluster_info.rs # core/src/broadcast_stage.rs # core/src/broadcast_stage/broadcast_duplicates_run.rs # core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs # core/src/broadcast_stage/standard_broadcast_run.rs * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-09-28 12:55:01 +00:00
mergify[bot]	a1a0c63862	retransmits shreds recovered from erasure codes (backport #19233 ) (#20249 ) * removes packet-count metrics from retransmit stage Working towards sending shreds (instead of packets) to retransmit stage so that shreds recovered from erasure codes are as well retransmitted. Following commit will add these metrics back to window-service, earlier in the pipeline. (cherry picked from commit `bf437b0336`) # Conflicts: # core/src/retransmit_stage.rs * adds packet/shred count stats to window-service Adding back these metrics from the earlier commit which removed them from retransmit stage. (cherry picked from commit `8198a7eae1`) * removes erroneous uses of Arc<...> from retransmit stage (cherry picked from commit `6e413331b5`) # Conflicts: # core/src/retransmit_stage.rs # core/src/tvu.rs * sends shreds (instead of packets) to retransmit stage Working towards channelling through shreds recovered from erasure codes to retransmit stage. (cherry picked from commit `3efccbffab`) # Conflicts: # core/src/retransmit_stage.rs * returns completed-data-set-info from insert_data_shred instead of opaque (u32, u32) which are then converted to CompletedDataSetInfo at the call-site. (cherry picked from commit `3c71670bd9`) # Conflicts: # ledger/src/blockstore.rs * retransmits shreds recovered from erasure codes Shreds recovered from erasure codes have not been received from turbine and have not been retransmitted to other nodes downstream. This results in more repairs across the cluster which is slower. This commit channels through recovered shreds to retransmit stage in order to further broadcast the shreds to downstream nodes in the tree. (cherry picked from commit `7a8807b8bb`) # Conflicts: # core/src/retransmit_stage.rs # core/src/window_service.rs * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-09-27 18:11:37 +00:00
mergify[bot]	5dd1c2191e	shares cluster-nodes between retransmit threads (backport #18947 ) (#20221 ) * shares cluster-nodes between retransmit threads (#18947) cluster_nodes and last_peer_update are not shared between retransmit threads, as each thread have its own value: https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477 Additionally, with shared references, this code: https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328 has a concurrency bug where the thread which does compare_and_swap, updates cluster_nodes much later after other threads have run with outdated cluster_nodes for a while. In particular, the write-lock there may block. (cherry picked from commit `d06dc6c8a6`) # Conflicts: # core/benches/retransmit_stage.rs # core/src/retransmit_stage.rs * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-09-26 16:29:34 +00:00
mergify[bot]	597c504c27	generate deterministic seeds for shreds (backport #17950 ) (#20172 ) * generate deterministic seeds for shreds (#17950) * generate shred seed from leader pubkey * clippy * clippy * review * review 2 * fmt * review * check * review * cleanup * fmt (cherry picked from commit `a86ced0bac`) # Conflicts: # core/benches/cluster_info.rs # core/src/broadcast_stage.rs # core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs # core/src/broadcast_stage/standard_broadcast_run.rs # ledger/src/shred.rs # sdk/src/feature_set.rs * removes backport merge conflicts Co-authored-by: jbiseda <jbiseda@gmail.com> Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-09-25 19:09:49 +00:00
mergify[bot]	b112e4a8aa	windows: Make solana-test-validator work (backport #20099 ) (#20123 ) * windows: Make solana-test-validator work (#20099) * windows: Make solana-test-validator work The important changes to get this going on Windows: * ledger lock needs to be done on a file instead of the directory * IPC service needs to use the Windows pipe naming scheme * always disable the JIT * file logging not possible yet because we can't redirect stderr, but this will change once env_logger fixes the pipe output target! * Integrate review feedback (cherry picked from commit `567f30aa1a`) # Conflicts: # validator/src/bin/solana-test-validator.rs # validator/src/lib.rs # validator/src/main.rs * Fix merge conflicts Co-authored-by: Jon Cinque <jon.cinque@gmail.com>	2021-09-24 12:59:12 +00:00
mergify[bot]	aa2098d115	Write helper for multithread update (#18808 ) (#19282 ) Co-authored-by: sakridge <sakridge@gmail.com>	2021-09-02 11:05:15 +00:00
mergify[bot]	eacc69efba	adds validator flag to allow private ip addresses (backport #18850 ) (#18975 ) * adds validator flag to allow private ip addresses (#18850) (cherry picked from commit `d2d5f36a3c`) # Conflicts: # accounts-cluster-bench/Cargo.toml # bench-tps/Cargo.toml # cli/Cargo.toml # core/benches/cluster_info.rs # core/src/banking_stage.rs # core/src/broadcast_stage.rs # core/src/broadcast_stage/broadcast_duplicates_run.rs # core/src/broadcast_stage/fail_entry_verification_broadcast_run.rs # core/src/broadcast_stage/standard_broadcast_run.rs # core/src/cluster_slots_service.rs # core/src/repair_service.rs # core/src/tvu.rs # core/src/validator.rs # dos/Cargo.toml # gossip/src/cluster_info.rs # gossip/src/crds_gossip_pull.rs # gossip/src/crds_gossip_push.rs # gossip/src/gossip_service.rs # local-cluster/Cargo.toml # local-cluster/src/cluster_tests.rs # local-cluster/tests/local_cluster.rs # rpc/Cargo.toml # rpc/src/rpc.rs # tokens/Cargo.toml # validator/Cargo.toml # validator/src/main.rs * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-07-29 21:43:24 +00:00
mergify[bot]	c534c928a7	encapsulates turbine peers computations of broadcast & retransmit stages (#18238 ) (#18464 ) Broadcast stage and retransmit stage should arrange nodes on turbine broadcast tree in exactly same order. Additionally any changes to this ordering (e.g. updating how unstaked nodes are handled) requires feature gating to keep the cluster in sync. Current implementation is scattered out over several public methods and exposes too much of implementation details (e.g. usize indices into peers vector) which makes code changes and checking for feature activations more difficult. This commit encapsulates turbine peer computations into a new struct, and only exposes two public methods, get_broadcast_peer and get_retransmit_peers, for call-sites. (cherry picked from commit `04787be8b1`) Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-07-07 14:30:55 +00:00
mergify[bot]	0e7512a225	Fix Nightly Clippy Warnings (backport #18065 ) (#18070 ) * chore: cargo +nightly clippy --fix -Z unstable-options (cherry picked from commit `6514096a67`) # Conflicts: # core/src/banking_stage.rs # core/src/cost_model.rs # core/src/cost_tracker.rs # core/src/execute_cost_table.rs # core/src/replay_stage.rs # core/src/tvu.rs # ledger-tool/src/main.rs # programs/bpf_loader/build.rs # rbpf-cli/src/main.rs # sdk/cargo-build-bpf/src/main.rs # sdk/cargo-test-bpf/src/main.rs # sdk/src/secp256k1_instruction.rs * chore: cargo fmt (cherry picked from commit `789f33e8db`) * Updates BPF program assert_instruction_count tests. (cherry picked from commit `c1e03f3410`) # Conflicts: # programs/bpf/tests/programs.rs * Resolve conflicts Co-authored-by: Alexander Meißner <AlexanderMeissner@gmx.net> Co-authored-by: Michael Vines <mvines@gmail.com>	2021-06-18 20:02:48 +00:00
mergify[bot]	ef205593c5	removes port-based forwarding logic from turbine retransmit (#17716 ) (#17973 ) Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see https://github.com/solana-labs/solana/issues/6672 https://github.com/solana-labs/solana/pull/7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in https://github.com/solana-labs/solana/pull/17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe. (cherry picked from commit `161838655c`) Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2021-06-15 15:16:20 +00:00
mergify[bot]	e247625025	Create solana-poh and move remaining rpc modules to solana-rpc (backport #17698 ) (#17745 ) * Create solana-poh and move remaining rpc modules to solana-rpc (#17698) * Create solana-poh crate * Move BigTableUploadService to solana-ledger * Add solana-rpc to workspace * Move dependencies to solana-rpc * Move remaining rpc modules to solana-rpc * Single use statement solana-poh * Single use statement solana-rpc (cherry picked from commit `544b3c0d17`) # Conflicts: # Cargo.lock # banking-bench/Cargo.toml # core/Cargo.toml # core/benches/banking_stage.rs # local-cluster/Cargo.toml # rpc/Cargo.toml # stake-monitor/Cargo.toml # validator/Cargo.toml * Fix conflicts & versions Co-authored-by: Tyera Eulberg <teulberg@gmail.com> Co-authored-by: Tyera Eulberg <tyera@solana.com>	2021-06-04 18:19:08 +00:00
Tyera Eulberg	9a5330b7eb	Move gossip modules into solana-gossip crate (#17352 ) * Move gossip modules to solana-gossip * Update Protocol abi digest due to move * Move gossip benches and hook up CI * Remove unneeded Result entries * Single use statements	2021-05-26 09:15:46 -06:00
behzad nouri	9d112cf41f	encapsulates purged values bookkeeping into crds module (#17265 ) For all code paths (gossip push, pull, purge, etc) that remove or override a crds value, it is necessary to record hash of values purged from crds table, in order to exclude them from subsequent pull-requests; otherwise the next pull request will likely return outdated values, wasting bandwidth: https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491 Currently this is done all over the place in multiple modules, and this has caused bugs in the past where purged values were not recorded. This commit encapsulated this bookkeeping into crds module, so that any code path which removes or overrides a crds value, also records the hash of purged value in-place.	2021-05-24 13:47:21 +00:00
Tyera Eulberg	827355a6b1	Create solana-rpc crate and move subscriptions (#17320 ) * Move non_circulating_supply to runtime * Add solana-rpc crate and move max_slots * Move subscriptions to solana-rpc * Single use statements	2021-05-19 00:54:28 -06:00
behzad nouri	1ac2a8cfa5	removes delayed crds inserts when upserting gossip table (#16806 ) It is crucial that VersionedCrdsValue::insert_timestamp does not go backward in time: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79 Otherwise methods such as get_votes and get_epoch_slots_since will break, which will break their downstream flow, including vote-listener and optimistic confirmation: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 For that, Crds::new_versioned is intended to be called "atomically" with Crds::insert_verioned (as the comment already says so): https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129 However, currently this is violated in the code. For example, filter_pull_responses creates VersionedCrdsValues (with the current timestamp), then acquires an exclusive lock on gossip, then process_pull_responses writes those values to the crds table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392 Depending on the workload and lock contention, the insert_timestamps may well be in the past when these values finally are inserted into gossip. To avoid such scenarios, this commit: * removes Crds::new_versioned and Crd::insert_versioned. * makes VersionedCrdsValue constructor private, only invoked in Crds::insert, so that insert_timestamp is populated right before insert. This will improve insert_timestamp monotonicity as long as Crds::insert is not called with a stalled timestamp. Following commits may further improve this by calling timestamp() inside Crds::insert, and/or switching to std::time::Instant which guarantees monotonicity.	2021-04-28 11:56:13 +00:00
behzad nouri	03194145c0	removes first_coding_index from erasure recovery code (#16646 ) first_coding_index is the same as the set_index and is so redundant: https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/blockstore_meta.rs#L49-L60	2021-04-23 12:00:37 +00:00
behzad nouri	37b8587d4e	expands number of erasure coding shreds in the last batch in slots (#16484 ) Number of parity coding shreds is always less than the number of data shreds in FEC blocks: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719 Data shreds are batched in chunks of 32 shreds each: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714 However the very last batch of data shreds in a slot can be small, in which case the loss rate can be exacerbated. This commit expands the number of coding shreds in the last FEC block in slots to: 64 - number of data shreds; so that FEC blocks are always 64 data and parity coding shreds each. As a consequence of this, the last FEC block has more parity coding shreds than data shreds. So for some shred indices we will have a coding shred but no data shreds. This should not cause any kind of overlapping FEC blocks as in: https://github.com/solana-labs/solana/pull/10095 since this is done only for the very last batch in a slot, and the next slot will reset the shred index.	2021-04-21 12:47:50 +00:00
steviez	bb24318ef0	Document shreds (#16514 ) No functionality changes from this commit	2021-04-16 14:04:46 -05:00
Justin Starry	85eb37fab0	Merge pull request from GHSA-8v47-8c53-wwrc * Track transaction check time separately from account loads * banking packet process metrics * Remove signature clone in status cache lookup * Reduce allocations when converting packets to transactions * Add blake3 hash of transaction messages in status cache * Bug fixes * fix tests and run fmt * Address feedback * fix simd tx entry verification * Fix rebase * Feedback * clean up * Add tests * Remove feature switch and fall back to signature check * Bump programs/bpf Cargo.lock * clippy * nudge benches * Bump `BankSlotDelta` frozen ABI hash` * Add blake3 to sdk/programs/Cargo.lock * nudge bpf tests * short circuit status cache checks Co-authored-by: Trent Nelson <trent@solana.com>	2021-04-13 00:28:08 -06:00
behzad nouri	3f63ed9a72	removes OrderedIterator and transaction batch iteration order (#16153 ) In TransactionBatch, https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/transaction_batch.rs#L4-L11 lock_results[i] is aligned with transactions[iteration_order[i]]: https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/bank.rs#L2414-L2424 https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/accounts.rs#L788-L817 However load_and_execute_transactions is iterating over lock_results[iteration_order[i]] https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/bank.rs#L2878-L2889 and then returning i as for the index of the retryable transaction. If iteratorion_order is [1, 2, 0], and i is 0, then: lock_results[iteration_order[i]] = lock_results[1] which corresponds to transactions[iteration_order[1]] = transactions[2] so neither i = 0, nor iteration_order[i] = 1 gives the correct index for the corresponding transaction (which is 2). This commit removes OrderedIterator and transaction batch iteration order entirely. There is only one place in blockstore processor which the iteration order is not ordinal: https://github.com/solana-labs/solana/blob/e50f59844/ledger/src/blockstore_processor.rs#L269-L271 It seems like, instead of using an iteration order, that can shuffle entry transactions in-place.	2021-03-31 23:59:19 +00:00
behzad nouri	4f82b897bc	buffers data shreds to make larger erasure coded sets (#15849 ) Broadcast stage batches up to 8 entries: https://github.com/solana-labs/solana/blob/79280b304/core/src/broadcast_stage/broadcast_utils.rs#L26-L29 which will be serialized into some number of shreds and chunked into FEC sets of at most 32 shreds each: https://github.com/solana-labs/solana/blob/79280b304/ledger/src/shred.rs#L576-L597 So depending on the size of entries, FEC sets can be small, which may aggravate loss rate. For example 16 FEC sets of 2:2 data/code shreds each have higher loss rate than one 32:32 set. This commit broadcasts data shreds immediately, but also buffers them until it has a batch of 32 data shreds, at which point 32 coding shreds are generated and broadcasted.	2021-03-23 14:52:38 +00:00
Jeff Washington (jwash)	57ba86c821	eliminate lock on record (#15929 ) * eliminate lock on record * use same error as MaxHeightReached * clippy * review feedback * refactor should_tick code * pr feedback	2021-03-23 09:10:04 -05:00
carllin	c1ba265dd9	Wallclock BankingStage Throttle (#15731 )	2021-03-15 17:11:15 -07:00
sakridge	d09112fa6d	PoH batch size calibration (#15717 )	2021-03-05 16:01:21 -08:00
sakridge	830be855dc	Forward and hold packets (#15634 )	2021-03-03 10:23:05 -08:00
carllin	ae96ba3459	Plumb slot update pubsub notifications (#15488 )	2021-02-28 23:29:11 -08:00
sakridge	1b59b163dd	Add max retransmit and shred insert slot (#15475 )	2021-02-23 13:06:33 -08:00
Trent Nelson	7f7370c306	Re-allow clippy::integer_arithmetic at crate-level	2021-02-17 13:55:08 -07:00
carllin	629dcd0f39	Cleanup buffered packets (#15210 )	2021-02-12 03:27:37 -08:00
behzad nouri	e1021d9f83	removes redundant epoch stakes cache in retransmit (#14781 ) Following `d6d76219b`, staked nodes computed from vote accounts are already cached in runtime::Stakes, so the caching in retransmit_stage is redundant.	2021-01-24 21:15:09 +00:00
sakridge	5c95d8e963	Shred filter (#14030 )	2020-12-10 07:54:15 -08:00
sakridge	c5fe076432	Better dupe detection (#13992 )	2020-12-09 23:14:31 -08:00
behzad nouri	cbea9ebc34	indexes nodes' contact infos in crds table (#13553 ) In several places in gossip code, the entire crds table is scanned only to filter out nodes' contact infos. Currently on mainnet, crds table is of size ~70k, while there are only ~470 nodes. So the full table scan is inefficient. Instead we may maintain an index of only nodes' contact infos.	2020-11-15 16:38:04 +00:00
sakridge	b4cf968e14	Add back shredding broadcast stats (#13463 )	2020-11-09 23:04:27 -08:00
behzad nouri	37c8842bcb	scans crds table in parallel for finding old labels (#13073 ) From runtime profiles, the majority time of ClusterInfo::handle_purge https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626 is spent scanning crds table finding old labels: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197 This can be done in parallel given that gossip thread-pool: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641 is idle when handle_purge is invoked: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681	2020-10-23 14:17:37 +00:00
Michael Vines	959880db60	Remove unused pubkey::Pubkey imports	2020-10-21 19:08:13 -07:00
Michael Vines	17c391121a	Run `codemod --extensions rs Hash::new_rand solana_sdk:#️⃣:new_rand`	2020-10-21 19:08:13 -07:00
Michael Vines	7bc073defe	Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand`	2020-10-21 19:08:13 -07:00
behzad nouri	537bbde22e	builds crds filters in parallel (#12360 ) Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads).	2020-09-29 23:06:02 +00:00
Ryo Onodera	cb8661bd49	Persistent tower (#10718 ) * Save/restore Tower * Avoid unwrap() * Rebase cleanups * Forcibly pass test * Correct reconcilation of votes after validator resume * d b g * Add more tests * fsync and fix test * Add test * Fix fmt * Debug * Fix tests... * save * Clarify error message and code cleaning around it * Move most of code out of tower save hot codepath * Proper comment for the lack of fsync on tower * Clean up * Clean up * Simpler type alias * Manage tower-restored ancestor slots without banks * Add comment * Extract long code blocks... * Add comment * Simplify returned tuple... * Tweak too aggresive log * Fix typo... * Add test * Update comment * Improve test to require non-empty stray restored slots * Measure tower save and dump all tower contents * Log adjust and add threshold related assertions * cleanup adjust * Properly lower stray restored slots priority... * Rust fmt * Fix test.... * Clarify comments a bit and add TowerError::TooNew * Further clean-up arround TowerError * Truly create ancestors by excluding last vote slot * Add comment for stray_restored_slots * Add comment for stray_restored_slots * Use BTreeSet * Consider root_slot into post-replay adjustment * Tweak logging * Add test for stray_restored_ancestors * Reorder some code * Better names for unit tests * Add frozen_abi to SavedTower * Fold long lines * Tweak stray ancestors and too old slot history * Re-adjust error conditon of too old slot history * Test normal ancestors is checked before stray ones * Fix conflict, update tests, adjust behavior a bit * Fix test * Address review comments * Last touch! * Immediately after creating cleaning pr * Revert stray slots * Revert comment... * Report error as metrics * Revert not to panic! and ignore unfixable test... * Normalize lockouts.root_slot more strictly * Add comments for panic! and more assertions * Proper initialize root without vote account * Clarify code and comments based on review feedback * Fix rebase * Further simplify based on assured tower root * Reorder code for more readability Co-authored-by: Michael Vines <mvines@gmail.com>	2020-09-19 14:03:54 +09:00
behzad nouri	9b866d79fb	shards crds values based on their hash prefix (#12187 ) filter_crds_values checks every crds filter against every hash value: https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432 which can be inefficient if the filter's bit-mask only matches small portion of the entire crds table. This commit shards crds values into separate tables based on shard_bits first bits of their hash prefix. Given a (mask, mask_bits) filter, filtering crds can be done by inspecting only relevant shards. If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds values which match (mask, mask_bits) bit pattern are traversed. If CrdsFilter.mask_bits > shard_bits, then approximately only 1/2^shard_bits of crds values are inspected. Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in generate_pull_responses metric, but with larger clusters, crds table and 2^mask_bits are both larger, so the impact should be more significant.	2020-09-17 14:05:16 +00:00
behzad nouri	28f2fa3fd5	uses rust intrinsics to convert hashes to u64 (#12097 )	2020-09-09 15:28:17 +00:00
Michael Vines	d15173ad9d	Address latest nightly clippy lints, but globally disable stable_sort_primitive	2020-08-17 22:36:10 -07:00
carllin	7e25130529	Send votes from banking stage to vote listener (#11434 ) * Send votes from banking stage to vote listener Co-authored-by: Carl <carl@solana.com>	2020-08-07 11:21:35 -07:00

1 2 3

148 Commits