Add RPC transaction history design
This commit is contained in:
parent
dfae9a9864
commit
e56ea138c7
106
docs/src/implemented-proposals/rpc-transaction-history.md
Normal file
106
docs/src/implemented-proposals/rpc-transaction-history.md
Normal file
@ -0,0 +1,106 @@
|
||||
# Long term RPC Transaction History
|
||||
There's a need for RPC to serve at least 6 months of transaction history. The
|
||||
current history, on the order of days, is insufficient for downstream users.
|
||||
|
||||
6 months of transaction data cannot be stored practically in a validator's
|
||||
rocksdb ledger so an external data store is necessary. The validator's
|
||||
rocksdb ledger will continue to serve as the primary data source, and then will
|
||||
fall back to the external data store.
|
||||
|
||||
The affected RPC endpoints are:
|
||||
* [getFirstAvailableBlock](https://docs.solana.com/apps/jsonrpc-api#getfirstavailableblock)
|
||||
* [getConfirmedBlock](https://docs.solana.com/apps/jsonrpc-api#getconfirmedblock)
|
||||
* [getConfirmedBlocks](https://docs.solana.com/apps/jsonrpc-api#getconfirmedblocks)
|
||||
* [getConfirmedSignaturesForAddress](https://docs.solana.com/apps/jsonrpc-api#getconfirmedsignaturesforaddress)
|
||||
* [getConfirmedTransaction](https://docs.solana.com/apps/jsonrpc-api#getconfirmedtransaction)
|
||||
* [getSignatureStatuses](https://docs.solana.com/apps/jsonrpc-api#getsignaturestatuses)
|
||||
|
||||
Note that [getBlockTime](https://docs.solana.com/apps/jsonrpc-api#getblocktime)
|
||||
is not supported, as once https://github.com/solana-labs/solana/issues/10089 is
|
||||
fixed then `getBlockTime` can be removed.
|
||||
|
||||
Some system design constraints:
|
||||
* The volume of data to store and search can quickly jump into the terabytes,
|
||||
and is immutable.
|
||||
* The system should be as light as possible for SREs. For example an SQL
|
||||
database cluster that requires an SRE to continually monitor and rebalance
|
||||
nodes is undesirable.
|
||||
* Data must be searchable in real time - batched queries that take minutes or
|
||||
hours to run are unacceptable.
|
||||
* Easy to replicate the data worldwide to co-locate it with the RPC endpoints
|
||||
that will utilize it.
|
||||
* Interfacing with the external data store should be easy and not require
|
||||
depending on risky lightly-used community-supported code libraries
|
||||
|
||||
Based on these constraints, Google's BigTable product is selected as the data
|
||||
store.
|
||||
|
||||
## Table Schema
|
||||
A BigTable instance is used to hold all transaction data, broken up into
|
||||
different tables for quick searching.
|
||||
|
||||
New data may be copied into the instance at anytime without affecting the existing
|
||||
data, and all data is immutable. Generally the expectation is that new data
|
||||
will be uploaded once an current epoch completes but there is no limitation on
|
||||
the frequency of data dumps.
|
||||
|
||||
Cleanup of old data is automatic by configuring the data retention policy of the
|
||||
instance tables appropriately, it just disappears. Therefore the order of when data is
|
||||
added becomes important. For example if data from epoch N-1 is added after data
|
||||
from epoch N, the older epoch data will outlive the newer data. However beyond
|
||||
producing _holes_ in query results, this kind of unordered deletion will
|
||||
have no ill effect. Note that this method of cleanup effectively allows for an
|
||||
unlimited amount of transaction data to be stored, restricted only by the
|
||||
monetary costs of doing so.
|
||||
|
||||
The table layout s supports the existing RPC endpoints only. New RPC endpoints
|
||||
in the future may require additions to the schema and potentially iterating over
|
||||
all transactions to build up the necessary metadata.
|
||||
|
||||
## Accessing BigTable
|
||||
BigTable has a gRPC endpoint that can be accessed using the
|
||||
[tonic](https://crates.io/crates/crate)] and the raw protobuf API, as currently no
|
||||
higher-level Rust crate for BigTable exists. Practically this makes parsing the
|
||||
results of BigTable queries more complicated but is not a significant issue.
|
||||
|
||||
## Data Population
|
||||
The ongoing population of instance data will occur on an epoch cadence through the
|
||||
use of a new `solana-ledger-tool` command that will convert rocksdb data for a
|
||||
given slot range into the instance schema.
|
||||
|
||||
The same process will be run once, manually, to backfill the existing ledger
|
||||
data.
|
||||
|
||||
### Block Table: `block`
|
||||
|
||||
This table contains the compressed block data for a given slot.
|
||||
|
||||
The row key is generated by taking the 16 digit lower case hexadecimal
|
||||
representation of the slot, to ensure that the oldest slot with a confirmed
|
||||
block will always be first when the rows are listed. eg, The row key for slot
|
||||
42 would be 000000000000002a.
|
||||
|
||||
The row data is a compressed `StoredConfirmedBlock` struct.
|
||||
|
||||
|
||||
### Account Address Transaction Signature Lookup Table: `tx-by-addr`
|
||||
|
||||
This table contains the transactions that affect a given address.
|
||||
|
||||
The row key is `<base58
|
||||
address>/<slot-id-one's-compliment-hex-slot-0-prefixed-to-16-digits>`. The row
|
||||
data is a compressed `TransactionByAddrInfo` struct.
|
||||
|
||||
Taking the one's compliment of the slot allows for listing of slots ensures that
|
||||
the newest slot with transactions that affect an address will always
|
||||
be listed first.
|
||||
|
||||
Sysvar addresses are not indexed. However frequently used programs such as
|
||||
Vote or System are, and will likely have a row for every confirmed slot.
|
||||
|
||||
### Transaction Signature Lookup Table: `tx`
|
||||
|
||||
This table maps a transaction signature to its confirmed block, and index within that block.
|
||||
|
||||
The row key is the base58-encoded transaction signature.
|
||||
The row data is a compressed `TransactionInfo` struct.
|
Loading…
x
Reference in New Issue
Block a user