shards crds values based on their hash prefix (#12187)

filter_crds_values checks every crds filter against every hash value:
https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432
which can be inefficient if the filter's bit-mask only matches small
portion of the entire crds table.

This commit shards crds values into separate tables based on shard_bits
first bits of their hash prefix. Given a (mask, mask_bits) filter,
filtering crds can be done by inspecting only relevant shards.

If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds
values which match (mask, mask_bits) bit pattern are traversed.
If CrdsFilter.mask_bits > shard_bits, then approximately only
1/2^shard_bits of crds values are inspected.

Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in
generate_pull_responses metric, but with larger clusters, crds table and
2^mask_bits are both larger, so the impact should be more significant.
This commit is contained in:
behzad nouri
2020-09-17 14:05:16 +00:00
committed by GitHub
parent 5546b6b676
commit 9b866d79fb
6 changed files with 414 additions and 60 deletions

View File

@@ -29,6 +29,7 @@ pub mod crds_gossip;
pub mod crds_gossip_error;
pub mod crds_gossip_pull;
pub mod crds_gossip_push;
pub mod crds_shards;
pub mod crds_value;
pub mod epoch_slots;
pub mod fetch_stage;