From c70f8d26af0c2fbcfc0ab94d34516cff9eaad03a Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Wed, 21 Jul 2021 14:25:53 -0700 Subject: [PATCH] Accounts db replication design proposal (#18651) Problem Validators fall behind the network when bogged down by heavy RPC load. This seems to be due to a combination of CPU load and lock contention caused by serving RPC requests. The most expensive RPC requests involve account scans. Summary of Changes The AccountsDb replication design proposal is described. --- docs/src/proposals/accounts-db-replication.md | 185 ++++++++++++++++++ 1 file changed, 185 insertions(+) create mode 100644 docs/src/proposals/accounts-db-replication.md diff --git a/docs/src/proposals/accounts-db-replication.md b/docs/src/proposals/accounts-db-replication.md new file mode 100644 index 0000000000..c629e1a6dd --- /dev/null +++ b/docs/src/proposals/accounts-db-replication.md @@ -0,0 +1,185 @@ +--- +title: AccountsDB Replication for RPC Services +--- + +## Problem + +Validators fall behind the network when bogged down by heavy RPC load. This +seems to be due to a combination of CPU load and lock contention caused by +serving RPC requests. The most expensive RPC requests involve account scans. + +## Solution Overview + +AccountsDB `replicas` that run separately from the main validator can be used to +offload account-scan requests. Replicas would only: request and pull account +updates from the validator, serve client account-state RPC requests, and manage +AccountsDb and AccountsBackgroundService clean + shrink. + +The replica communicates to the main validator via a new RPC mechanism to fetch +metadata information about the replication and the accounts update from the validator. +The main validator supports only one replica node. A replica node can relay the +information to 1 or more other replicas forming a replication tree. + +At the initial start of the replica node, it downloads the latest snapshot +from a validator and constructs the bank and AccountsDb from it. After that, it queries +its main validator for new slots and requests the validator to send the updated +accounts for that slot and update to its own AccountsDb. + +The same RPC replication mechanism can be used between a replica to another replica. +This requires the replica to serve both the client and server in the replication model. + +On the client RPC serving side, `JsonRpcAccountsService` is responsible for serving +the client RPC calls for accounts related information similar to the existing +`JsonRpcService`. + +The replica will also take snapshots periodically so that it can start quickly after +a restart if the snapshot is not too old. + +## Detailed Solution +The following sections provide more details of the design. + +### Consistency Model +The AccountsDb information is replicated asynchronously from the main validator to the replica. +When a query against the replica's AccountsDb is made, the replica may not have the latest +information of the latest slot. In this regard, it will eventually be consistent. However, for +a particular slot, the information provided is consistent with that of its peer validator +for commitment levels confirmed and finalized. For V1, we only support queries at these two +levels. + +### Solana Replica Node +A new node named solana-replica-node will be introduced whose main responsibility is to maintain +the AccountsDb replica. The RPC node or replica node is used interchangeably in this document. +It will be a separate executable from the validator. + +The replica consists of the following major components: + +The `ReplicaUpdatedSlotsRequestor`: this service is responsible for periodically sending the +request `ReplicaUpdatedSlotsRequest` to its peer validator or replica for the latest slots. +It specifies the latest slot (last_replicated_slot) for which the replica has already +fetched the accounts information for. This maintains the ReplWorkingSlotSet and manages +the lifecycle of BankForks, BlockCommitmentCache (for the highest confirmed slot) and +the optimistically confirmed bank. + +The `ReplicaUpdatedSlotsServer`: this service is responsible for serving the +`ReplicaUpdatedSlotsRequest` and sends the `ReplicaUpdatedSlotsResponse` back to the requestor. +The response consists of a vector of new slots the validator knows of which is later than the +specified last_replicated_slot. This service also runs in the main validator. This service +gets the slots for replication from the BankForks, BlockCommitmentCache and OptimiscallyConfirmBank. + +The `ReplicaAccountsRequestor`: this service is responsible for sending the request +`ReplicaAccountsRequest` to its peer validator or replica for the `ReplicaAccountInfo` for a +slot for which it has not completed accounts db replication. The `ReplicaAccountInfo` contains +the `ReplicaAccountMeta`, Hash and the AccountData. The `ReplicaAccountMeta` contains info about +the existing `AccountMeta` in addition to the account data length in bytes. + +The `ReplicaAccountsServer`: this service is reponsible for serving the `ReplicaAccountsRequest` +and sends `ReplicaAccountsResponse` to the requestor. The response contains the count of the +ReplAccountInfo and the vector of ReplAccountInfo. This service runs both in the validator +and the replica relaying replication information. The server can stream the account information +from its AccountCache or from the storage if already flushed. This is similar to how a snapshot +package is created from the AccountsDb with the difference that the storage does not need to be +flushed to the disk before streaming to the client. If the account data is in the cache, it can +be directly streamed. Care must be taken to avoid account data for a slot being cleaned while +serving the streaming. When attempting a replication of a slot, if the slot is already cleaned +up with accounts data cleaned as result of update in later rooted slots, the replica should +forsake this slot and try the later uncleaned root slot. + +During replication we also need to replicate the information of accounts that have been cleaned +up due to zero lamports, i.e. we need to be able to tell the difference between an account in a +given slot which was not updated and hence has no storage entry in that slot, and one that +holds 0 lamports and has been cleaned up through the history. We may record this via some +"Tombstone" mechanism -- recording the dead accounts cleaned up fora slot. The tombstones +themselves can be removed after exceeding the retention period expressed as epochs. Any +attempt to replicate slots with tombstones removed will fail and the replica should skip +this slot and try later ones. + +The `JsonRpcAccountsService`: this is the RPC service serving client requests for account +information. The existing JsonRpcService serves other client calls than AccountsDb ones. +The replica node only serves the AccountsDb calls. + +The existing JsonRpcService requires `BankForks`, `OptimisticallyConfirmedBank` and +`BlockCommitmentCache` to load the Bank. The JsonRpcAccountsService will need to use +information obtained from ReplicaUpdatedSlotsResponse to construct the AccountsDb. + +The `AccountsBackgroundService`: this service also runs in the replica which is responsible +for taking snapshots periodically and shrinking the AccountsDb and doing accounts cleaning. +The existing code also uses BankForks which we need to keep in the replica. + +### Compatibility Consideration + +For protocol compatibility considerations, all the requests have the replication version which +is initially set to 1. Alternatively, we can use the validator's version. The RPC server side +shall check the request version and fail if it is not supported. + +### Replication Setup +To limit adverse effects on the validator and the replica due to replication, they can be +configured with a list of replica nodes which can form a replication pair with it. And the +replica node is configured with the validator which can serve its requests. + + +### Fault Tolerance +The main responsibility of making sure the replication is tolerant of faults lies with the +replica. In case of request failures, the replica shall retry the requests. + + +### Interface + +Following are the client RPC APIs supported by the replica node in JsonRpcAccountsService. + +- getAccountInfo +- getBlockCommitment +- getMultipleAccounts +- getProgramAccounts +- getMinimumBalanceForRentExemption +- getInflationGovenor +- getInflationRate +- getEpochSchedule +- getRecentBlockhash +- getFees +- getFeeCalculatorForBlockhash +- getFeeRateGovernor +- getLargestAccounts +- getSupply +- getStakeActivation +- getTokenAccountBalance +- getTokenSupply +- getTokenLargestAccounts +- getTokenAccountsByOwner +- getTokenAccountsByDelegate + +Following APIs are not included: + +- getInflationReward +- getClusterNodes +- getRecentPerformanceSamples +- getGenesisHash +- getSignatueStatuses +- getMaxRetransmitSlot +- getMaxShredInsertSlot +- sendTransaction +- simulateTransaction +- getSlotLeader +- getSlotLeaders +- minimumLedgerSlot +- getBlock +- getBlockTime +- getBlocks +- getBlocksWithLimit +- getTransaction +- getSignaturesForAddress +- getFirstAvailableBlock +- getBlockProduction + + +Action Items + +1. Build the replica framework and executable +2. Integrate snapshot restore code for bootstrap the AccountsDb. +3. Develop the ReplicaUpdatedSlotsRequestor and ReplicaUpdatedSlotsServer interface code +4. Develop the ReplicaUpdatedSlotsRequestor and ReplicaUpdatedSlotsServer detailed implementations: managing the ReplEligibleSlotSet lifecycle: adding new roots and deleting root to it. And interfaces managing ReplWorkingSlotSet interface: adding and removing. Develop component synthesising information from BankForks, BlockCommitmentCache and OptimistcallyConfirmedBank on the server side and maintaining information on the client side. +5. Develop the interface code for ReplicaAccountsRequestor and ReplicaAccountsServer +6. Develop detailed implementation for ReplicaAccountsRequestor and ReplicaAccountsServer and develop the replication account storage serializer and deserializer. +7. Develop the interface code JsonRpcAccountsService +8. Detailed Implementation of JsonRpcAccountsService, refactor code to share with part of JsonRpcService. +9. Integrate with the AccountsBackgroundService in the replica for shrinking, cleaning, snapshotting. +10. Metrics and performance testing