On the receiving end, the outdated values are discarded, and they will
only waste bandwidth:
https://github.com/solana-labs/solana/blob/3f0480d06/core/src/crds_gossip_pull.rs#L385-L400
This is also exacerbating validator start, since the entrypoint is
returning old values in pull responses, and the validator immediately
discards those; resulting in huge delay until the validator obtains
contact-info of the entrypoint and is able to adopt shred-version and
fully start.
(cherry picked from commit 5e6b00fe98)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
When a validator starts, it has an (almost) empty crds table and it only
sends one pull-request to the entrypoint. The bloom filter in the
pull-request targets 10% false rate given the number of items. So, if
the `num_items` is very wrong, it makes a very small bloom filter with a
very high false rate:
https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48
As a result, it is very unlikely that the validator obtains entrypoint's
contact-info in response. This exacerbates how long the validator will
loop on:
> Waiting to adopt entrypoint shred version
https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412
This commit increases the min number of bloom items when making gossip
pull requests. Effectively this will break the entrypoint crds table
into 64 shards, one pull-request for each, a larger bloom filter for
each shard, and increases the chances that the response will include
entrypoint's contact-info, which is needed for adopting shred version
and validator start.
(cherry picked from commit e8b35a4f7b)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
The current implementations use only the id and disregard other fields,
in particular wallclock. This can lead to bugs where an outdated
contact-info shadows or overrides a current one because they compare
equal.
(cherry picked from commit 13b032b2d4)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
* rpc: plumb shred_version through RpcContactInfo
(cherry picked from commit 67e6a3106f)
* test-validator: Display more cluster info in dash
(cherry picked from commit 754c708473)
Co-authored-by: Trent Nelson <trent@solana.com>
* Require that blockstore block-time only be recognized slot, instead of root
* Move cache_block_time to after Bank freeze
* Single use statement
* Pass transaction_status_sender by reference
* Remove unnecessary slot-existence check before caching block time altogether
* Move block-time existence check into Blockstore::cache_block_time, Blockstore no longer needed in blockstore_processor helper
(cherry picked from commit 6e9deaf1bd)
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* indexes crds values by their insert order
(cherry picked from commit dfa3e7a61c)
* reads gossip push messages off crds ordinal index
Having an ordinal index on crds values based on insert order allows to
efficiently filter values using a cursor. In particular
CrdsGossipPush::push_messages hash-map can be replaced with a cursor,
saving on the bookkeepings, purging, etc
(cherry picked from commit 22c02b917e)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
VersionedCrdsValue.insert_timestamp is used for fetching crds values
inserted since last query:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298
So it is crucial that insert_timestamp does not go backward in time when
new values are inserted into the table. However std::time::SystemTime is
not monotonic, or due to workload, lock contention, thread scheduling,
etc, ... new values may be inserted with a stalled timestamp way in the
past. Additionally, reading system time for the above purpose is
inefficient/unnecessary.
This commit adds an ordinal index to crds values indicating their insert
order. Additionally, it implements a new Cursor type for fetching values
inserted since last query.
(cherry picked from commit fa86a335b0)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
* uses Mutex instead of RwLock for ping_cache
(cherry picked from commit 2231017b35)
* validates gossip addresses before sending pull-requests
IP addresses need to be validated before sending packets to them.
This commit, sends a ping packet to nodes before any pull requests.
Pull requests are then only sent to the nodes which have responded with
the correct hash of their respective ping packet.
(cherry picked from commit 7cea2c4466)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Local timestamps are updated for records associated with a pubkey if the
origin is still active:
https://github.com/solana-labs/solana/blob/c8ed14c64/core/src/crds.rs#L301-L311
However this is done inconsistently on some gossip paths (pull requests
and pull responses) but not all (e.g. push messages). Additionally
update_record_timestamp is inefficient since there can be ~800 values
associated with each pubkey.
This commit updates records timestamps only on contact-infos; and,
instead utilizes origin's timestamp when purging old values.
(cherry picked from commit 2c82f2154d)
Co-authored-by: behzad nouri <behzadnouri@gmail.com>
* Add TPU client for sending txs to the current leader tpu port
* Update tpu_client.rs
(cherry picked from commit 75b8434b76)
Co-authored-by: Justin Starry <justin@solana.com>
* expands number of erasure coding shreds in the last batch in slots (#16484)
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719
Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714
However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.
This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.
As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
(cherry picked from commit 37b8587d4e)
# Conflicts:
# core/benches/shredder.rs
# ledger/src/shred.rs
* removes backport merge conflicts
* ignore the flaky test for now
Co-authored-by: behzad nouri <behzadnouri@gmail.com>