From 8116fe8defdcaa0d9b131524541ca7c2a66856f0 Mon Sep 17 00:00:00 2001 From: carllin Date: Thu, 3 Jan 2019 14:12:55 -0800 Subject: [PATCH] Add proposed design for db_ledger (#2253) * Add proposed design for db_ledger --- book/src/entry-tree.md | 47 +++++++++++++++++++++++++++++++++++------- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/book/src/entry-tree.md b/book/src/entry-tree.md index 4b6289cd5b..6a59433c86 100644 --- a/book/src/entry-tree.md +++ b/book/src/entry-tree.md @@ -11,7 +11,7 @@ The basic responsibilities of the window and the ledger in a Solana fullnode are: 1. Window: serve as a temporary, RAM-backed store of blobs of the PoH chain - for re-ordering and assembly into contiguous blocks to be sent to the bank + for reordering and assembly into contiguous blocks to be sent to the bank for verification. 2. Window: serve as a RAM-backed repair facility for other validator nodes, which may query the network for as-yet unreceived blobs. @@ -90,19 +90,52 @@ preserving the chain of origination. (i.e. dealing with forks) will have to be used for the most recent entries in the EntryTree. +### EntryTree Design + +1. Entries in the EntryTree are stored as key-value pairs, where the key is the concatenated +slot index and blob index for an entry, and the value is the entry data. Note blob indexes are zero-based for each slot (i.e. they're slot-relative). + +2. The EntryTree maintains metadata for each slot, in the `SlotMeta` struct containing: + * `slot_index` - The index of this slot + * `num_blocks` - The number of blocks in the slot (used for chaining to a previous slot) + * `consumed` - The highest blob index `n`, such that for all `m < n`, there exists a blob in this slot with blob index equal to `n` (i.e. the highest consecutive blob index). + * `received` - The highest received blob index for the slot + * `next_slots` - A list of future slots this slot could chain to. Used when rebuilding + the ledger to find possible fork points. + * `consumed_ticks` - Tick height of the highest received blob (used to identify when a slot is full) + * `is_trunk` - True iff every block from 0...slot forms a full sequence without any holes. We can derive is_trunk for each slot with the following rules. Let slot(n) be the slot with index `n`, and slot(n).contains_all_ticks() is true if the slot with index `n` has all the ticks expected for that slot. Let is_trunk(n) be the statement that "the slot(n).is_trunk is true". Then: + + is_trunk(0) + is_trunk(n+1) iff (is_trunk(n) and slot(n).contains_all_ticks() + +3. Chaining - When a blob for a new slot `x` arrives, we check the number of blocks (`num_blocks`) for that new slot (this information is encoded in the blob). We then know that this new slot chains to slot `x - num_blocks`. + +4. Subscriptions - The EntryTree records a set of slots that have been "subscribed" to. This means entries that chain to these slots will be sent on the EntryTree channel for consumption by the ReplayStage. See the `EntryTree APIs` for details. + +5. Update notifications - The EntryTree notifies listeners when slot(n).is_trunk is flipped from false to true for any `n`. + +### EntryTree APIs + +The EntryTree offers a subscription based API that ReplayStage uses to ask for entries it's interested in. The entries will be sent on a channel exposed by the EntryTree. These subscription API's are as follows: + 1. `fn get_slots_since(slot_indexes: &[u64]) -> Vec`: Returns new slots connecting to any element of the list `slot_indexes`. + + 2. `fn get_slot_entries(slot_index: u64, entry_start_index: usize, max_entries: Option) -> Vec`: Returns the entry vector for the slot starting with `entry_start_index`, capping the result at `max` if `max_entries == Some(max)`, otherwise, no upper limit on the length of the return vector is imposed. + +Note: Cumulatively, this means that the replay stage will now have to know when a slot is finished, and subscribe to the next slot it's interested in to get the next set of entries. Previously, the burden of chaining slots fell on the EntryTree. + ### Interfacing with Bank The bank exposes to replay stage: - 1. prev_id: which PoH chain it's working on as indicated by the id of the last + 1. `prev_id`: which PoH chain it's working on as indicated by the id of the last entry it processed - 2. tick_height: the ticks in the PoH chain currently being verified by this + 2. `tick_height`: the ticks in the PoH chain currently being verified by this bank - 3. votes: a stack of records that contain + 3. `votes`: a stack of records that contain: - 1. prev_ids: what anything after this vote must chain to in PoH - 2. tick height: the tick_height at which this vote was cast - 3. lockout period: how long a chain must be observed to be in the ledger to + 1. `prev_ids`: what anything after this vote must chain to in PoH + 2. `tick_height`: the tick height at which this vote was cast + 3. `lockout period`: how long a chain must be observed to be in the ledger to be able to be chained below this vote Replay stage uses EntryTree APIs to find the longest chain of entries it can