solana

Author	SHA1	Message	Date
Tao Zhu	b6dff12923	update ledger tool to restore cost table from blockstore (#18489 ) * update ledger tool to restore cost model from blockstore when compute-slot-cost * Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool * refactor and simplify a test	2021-07-07 23:44:51 -05:00
Michael Vines	1e0942e900	Rename ClusterInfo::send_vote to ClusterInfo::send_transaction	2021-07-07 15:51:14 -07:00
jbiseda	a86ced0bac	generate deterministic seeds for shreds (#17950 ) * generate shred seed from leader pubkey * clippy * clippy * review * review 2 * fmt * review * check * review * cleanup * fmt	2021-07-07 08:21:12 -07:00
behzad nouri	a0551b4054	persists repair-peers cache across repair service loops (#18400 ) The repair-peers cache is reset each time repair service loop runs, and so computed repeatedly for the same slots: https://github.com/solana-labs/solana/blob/d2b07dca9/core/src/repair_service.rs#L275 This commit uses an LRU cache to persists repair-peers for each slot. In addition to LRU eviction rules, in order to avoid re-using outdated data, each entry also has 10 seconds TTL.	2021-07-07 14:12:09 +00:00
behzad nouri	04787be8b1	encapsulates turbine peers computations of broadcast & retransmit stages (#18238 ) Broadcast stage and retransmit stage should arrange nodes on turbine broadcast tree in exactly same order. Additionally any changes to this ordering (e.g. updating how unstaked nodes are handled) requires feature gating to keep the cluster in sync. Current implementation is scattered out over several public methods and exposes too much of implementation details (e.g. usize indices into peers vector) which makes code changes and checking for feature activations more difficult. This commit encapsulates turbine peer computations into a new struct, and only exposes two public methods, get_broadcast_peer and get_retransmit_peers, for call-sites.	2021-07-07 00:35:25 +00:00
Justin Starry	100fabf469	Remove feature switch for demoting sysvar write locks (#18373 )	2021-07-06 21:22:22 +00:00
Tao Zhu	0e039b4094	Aggregate cost_model into cost_tracker (#18374 ) * * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions * review fixes	2021-07-06 15:41:25 +00:00
Michael Vines	d5c2c72360	Rename Tower::lockouts to Tower::vote_state	2021-07-02 18:35:49 -07:00
Tao Zhu	7cd6224caf	log warning when channel send fails (#18391 )	2021-07-02 19:04:09 +00:00
carllin	0eca92de18	Make set roots an iterator (#18357 )	2021-07-01 20:02:40 -07:00
Michael Vines	b6792a3328	Add ability to change the validator identity at runtime	2021-07-01 17:50:04 -07:00
Brooks Prumo	45d54b1fc6	Add SnapshotArchiveInfo and refactor functions in snapshot_utils (#18232 )	2021-07-01 12:20:56 -05:00
Tao Zhu	5e424826ba	Persist cost table to blockstore (#18123 ) * Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks * Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()` * Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time * Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory * Only try to persist to blockstore when cost_table is changed. * Restore cost table during validator startup * Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads; * Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.	2021-07-01 11:32:41 -05:00
Brooks Prumo	89a3e4f91e	Move SnapshotConfig into its own module (#18331 ) Also move ArchiveFormat to snapshot_utils, and do not reexport SnapshotVersion.	2021-07-01 08:55:26 -05:00
sakridge	8d9a6deda4	Add repair number per slot (#18082 )	2021-06-30 18:20:07 +02:00
Trent Nelson	02b14caa5f	test-validator: hold rent constant with `--slots-per-epoch`	2021-06-30 00:46:12 -06:00
carllin	68c87469c3	Cleanup ReplayStage tests (#18241 )	2021-06-28 20:19:42 -07:00
Tao Zhu	9d6f1ebef4	investigate system performance test degradation (#17919 ) * Add stats and counter around cost model ops, mainly: - calculate transaction cost - check transaction can fit in a block - update block cost tracker after transactions are added to block - replay_stage to update/insert execution cost to table * Change mutex on cost_tracker to RwLock * removed cloning cost_tracker for local use, as the metrics show clone is very expensive. * acquire and hold locks for block of TXs, instead of acquire and release per transaction; * remove redundant would_fit check from cost_tracker update execution path * refactor cost checking with less frequent lock acquiring * avoid many Transaction_cost heap allocation when calculate cost, which is in the hot path - executed per transaction. * create hashmap with new_capacity to reduce runtime heap realloc. * code review changes: categorize stats, replace explicit drop calls, concisely initiate to default * address potential deadlock by acquiring locks one at time	2021-06-28 21:34:04 -05:00
sakridge	5d08bf9aa3	More detailed voting timings in replay stage (#18229 )	2021-06-26 17:32:08 +02:00
Trent Nelson	d269975784	Revert "Clean up build warning" This reverts commit `17a173ebb5`.	2021-06-24 19:57:52 -06:00
Michael Vines	314102cb54	Remove redundant JsonRpcConfig::identity_pubkey field	2021-06-22 17:20:11 -07:00
sakridge	e808f34b0b	Add batch stats (#18096 )	2021-06-22 15:23:26 +02:00
Michael Vines	3b1517237c	Clean up argument names	2021-06-21 21:29:52 -07:00
Michael Vines	84b9de8c18	Shredder no longer holds a keypair	2021-06-21 21:29:52 -07:00
Michael Vines	2435ea3ad8	Remove redundant ReplayStageConfig::my_pubkey field	2021-06-21 21:29:52 -07:00
Michael Vines	51a0007001	serve_repair: Remove internal ContactInfo field duplication	2021-06-21 17:23:49 -07:00
behzad nouri	598093b5db	adds shred-version to ip-echo-server response When starting a validator, the node initially joins gossip with shred_verison = 0, until it adopts the entrypoint's shred-version: https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417 Depending on the load on the entrypoint, this adopting entrypoint shred-version through gossip sometimes becomes very slow, and causes several problems in gossip because we have to partially support shred_version == 0 which is a source of leaking crds values from one cluster to another. e.g. see https://github.com/solana-labs/solana/pull/17899 and the other linked issues there. In order to remove shred_version == 0 from gossip, this commit adds shred-version to ip-echo-server response. Once the entrypoints are updated, on validator start-up, if --expected_shred_version is not specified we will obtain shred-version from the entrypoint using ip-echo-server.	2021-06-21 19:37:16 +00:00
Jeff Washington (jwash)	ec2f930475	user process.accounts_db_test_hash_calculation for debug_verify hash (#18053 )	2021-06-21 10:20:27 -05:00
Michael Vines	4a12c715a3	Drop Error suffix from enum values to avoid the enum_variant_names clippy lint	2021-06-18 23:02:13 +00:00
Alexander Meißner	789f33e8db	chore: cargo fmt	2021-06-18 10:42:46 -07:00
Alexander Meißner	6514096a67	chore: cargo +nightly clippy --fix -Z unstable-options	2021-06-18 10:42:46 -07:00
Tyera Eulberg	d0511de9a6	chore: bump trees from 0.2.1 to 0.4.2 (#18052 ) * chore: bump trees from 0.2.1 to 0.4.2 (#18041) Bumps [trees](https://github.com/oooutlk/trees) from 0.2.1 to 0.4.2. - [Release notes](https://github.com/oooutlk/trees/releases) - [Commits](https://github.com/oooutlk/trees/commits) --- updated-dependencies: - dependency-name: trees dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Accommodate field & type changes Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-06-17 22:45:09 +00:00
Lijun Wang	071b1ee3e5	Removed pub from some functions which are actually private to improve encapsulation (#18030 ) Remove the pub marker to improve encapsulation. Readability improvement only, no functional impact.	2021-06-17 10:14:21 -07:00
Michael Vines	fa04531c7a	Extricate RpcCompletedSlotsService from RetransmitStage	2021-06-16 16:20:35 -07:00
Trent Nelson	5bc6c89adc	validator: run poh speed test earlier in start up	2021-06-16 21:27:08 +00:00
behzad nouri	161838655c	removes port-based forwarding logic from turbine retransmit (#17716 ) Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see https://github.com/solana-labs/solana/issues/6672 https://github.com/solana-labs/solana/pull/7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in https://github.com/solana-labs/solana/pull/17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.	2021-06-15 13:19:41 +00:00
carllin	ccc013e134	Handle removing slots during account scans (#17471 )	2021-06-14 21:04:01 -07:00
sakridge	eeee75c5be	Don't use pinned memory when unnecessary (#17832 ) Reports of excessive GPU memory usage and errors from cudaHostRegister. There are some cases where pinning is not required.	2021-06-14 16:10:04 +02:00
sakridge	0feac57cb0	Don't store votes unless we are leader soon (#17803 )	2021-06-11 18:29:05 +02:00
carllin	c8535be0e1	Port unconfirmed duplicate tracking logic from ProgressMap to ForkChoice (#17779 )	2021-06-11 03:09:57 -07:00
carllin	afafa624a3	Account for duplicate before a bank is frozen or replayed (#17866 )	2021-06-10 22:28:23 -07:00
Lijun Wang	269d995832	Make account shrink configurable #17544 (#17778 ) 1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio 2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage 3. Added unit tests for the new functions added 4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true Fixes #17544	2021-06-09 21:21:32 -07:00
Tao Zhu	ae27fcbcda	replay stage feed back program cost (#17731 ) * replay stage feeds back realtime per-program execution cost to cost model; * program cost execution table is initialized into empty table, no longer populated with hardcoded numbers; * changed cost unit to microsecond, using value collected from mainnet; * add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.	2021-06-09 17:10:59 -05:00
Justin Starry	050bb5446d	Add local cluster tests that broadcast duplicate slots (#13995 ) * Add duplicate node local cluster test * fix clippy * remove dupe test	2021-06-09 15:01:48 -07:00
Michael Vines	e5e7390d44	Wrap long lines	2021-06-08 12:05:29 -07:00
Tyera Eulberg	544b3c0d17	Create solana-poh and move remaining rpc modules to solana-rpc (#17698 ) * Create solana-poh crate * Move BigTableUploadService to solana-ledger * Add solana-rpc to workspace * Move dependencies to solana-rpc * Move remaining rpc modules to solana-rpc * Single use statement solana-poh * Single use statement solana-rpc	2021-06-04 09:23:06 -06:00
sakridge	f97ce2cd7e	Per-program id timings (#17554 )	2021-06-04 16:04:31 +02:00
behzad nouri	be957f25c9	adds fallback logic if retransmit multicast fails (#17714 ) In retransmit-stage, based on the packet.meta.seed and resulting children/neighbors, each packet is sent to a different set of peers: https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457 However, current code errors out as soon as a multicast call fails, which will skip all the remaining packets: https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can exacerbate packets loss in turbine. This commit: * keeps iterating over retransmit packets for loop even if some intermediate sends fail. * adds a fallback to UdpSocket::send_to if multicast fails. Recent discord chat: https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733	2021-06-04 12:16:37 +00:00
Tyera Eulberg	3a647c4bea	Rename ValidatorExit and move to sdk (#17728 )	2021-06-04 03:06:13 +00:00
carllin	96ba2edfeb	Switch EpochSlots to be frozen slots, not completed slots (#17168 )	2021-06-03 00:20:00 +00:00

... 3 4 5 6 7 ...

2273 Commits