Long-term ledger storage with BigTable (bp #11222) (#11392)

* ledger-storage-bigtable boilerplate

(cherry picked from commit 9d2293bb32)

* $ wget https://pki.goog/roots.pem -O pki-goog-roots.pem

(cherry picked from commit 1617a025ce)

* Add access_token module

(cherry picked from commit 59d266a111)

* Add root_ca_certificate

(cherry picked from commit faa016e4b7)

* Add build-proto

(cherry picked from commit c31e1f5bf0)

* UiTransactionEncoding is now copy

(cherry picked from commit 494968be66)

* Increase timeout

(cherry picked from commit 57dfebc5ba)

* Add build-proto/build.sh output

(cherry picked from commit 54dae6ba2c)

* Supress doctest errors

(cherry picked from commit 019c75797d)

* Add compression

(cherry picked from commit 243e05d59f)

* Add bigtable

(cherry picked from commit 6e0353965a)

* Add configuration info

(cherry picked from commit 98cca1e774)

* Add ledger-tool bigtable subcommands

(cherry picked from commit f9049d6ee4)

# Conflicts:
#	ledger-tool/Cargo.toml

* Make room for tokio 0.2

(cherry picked from commit b876fb84ba)

# Conflicts:
#	core/Cargo.toml

* Setup a tokio 0.2 runtime for RPC usage

(cherry picked from commit 0e02740565)

# Conflicts:
#	core/Cargo.toml

* Plumb Bigtable ledger storage into the RPC subsystem

(cherry picked from commit dfae9a9864)

# Conflicts:
#	core/Cargo.toml

* Add RPC transaction history design

(cherry picked from commit e56ea138c7)

* Simplify access token refreshing

(cherry picked from commit 1f7af14386)

* Report block status more frequently

(cherry picked from commit 22c46ebf96)

* after -> before

(cherry picked from commit 227ea934ff)

* Rebase

* Cargo.lock

Co-authored-by: Michael Vines <mvines@gmail.com>
This commit is contained in:
mergify[bot]
2020-08-06 04:06:44 +00:00
committed by GitHub
parent 6542a04521
commit 5841e4d665
35 changed files with 6214 additions and 90 deletions

View File

@@ -0,0 +1,106 @@
# Long term RPC Transaction History
There's a need for RPC to serve at least 6 months of transaction history. The
current history, on the order of days, is insufficient for downstream users.
6 months of transaction data cannot be stored practically in a validator's
rocksdb ledger so an external data store is necessary. The validator's
rocksdb ledger will continue to serve as the primary data source, and then will
fall back to the external data store.
The affected RPC endpoints are:
* [getFirstAvailableBlock](https://docs.solana.com/apps/jsonrpc-api#getfirstavailableblock)
* [getConfirmedBlock](https://docs.solana.com/apps/jsonrpc-api#getconfirmedblock)
* [getConfirmedBlocks](https://docs.solana.com/apps/jsonrpc-api#getconfirmedblocks)
* [getConfirmedSignaturesForAddress](https://docs.solana.com/apps/jsonrpc-api#getconfirmedsignaturesforaddress)
* [getConfirmedTransaction](https://docs.solana.com/apps/jsonrpc-api#getconfirmedtransaction)
* [getSignatureStatuses](https://docs.solana.com/apps/jsonrpc-api#getsignaturestatuses)
Note that [getBlockTime](https://docs.solana.com/apps/jsonrpc-api#getblocktime)
is not supported, as once https://github.com/solana-labs/solana/issues/10089 is
fixed then `getBlockTime` can be removed.
Some system design constraints:
* The volume of data to store and search can quickly jump into the terabytes,
and is immutable.
* The system should be as light as possible for SREs. For example an SQL
database cluster that requires an SRE to continually monitor and rebalance
nodes is undesirable.
* Data must be searchable in real time - batched queries that take minutes or
hours to run are unacceptable.
* Easy to replicate the data worldwide to co-locate it with the RPC endpoints
that will utilize it.
* Interfacing with the external data store should be easy and not require
depending on risky lightly-used community-supported code libraries
Based on these constraints, Google's BigTable product is selected as the data
store.
## Table Schema
A BigTable instance is used to hold all transaction data, broken up into
different tables for quick searching.
New data may be copied into the instance at anytime without affecting the existing
data, and all data is immutable. Generally the expectation is that new data
will be uploaded once an current epoch completes but there is no limitation on
the frequency of data dumps.
Cleanup of old data is automatic by configuring the data retention policy of the
instance tables appropriately, it just disappears. Therefore the order of when data is
added becomes important. For example if data from epoch N-1 is added after data
from epoch N, the older epoch data will outlive the newer data. However beyond
producing _holes_ in query results, this kind of unordered deletion will
have no ill effect. Note that this method of cleanup effectively allows for an
unlimited amount of transaction data to be stored, restricted only by the
monetary costs of doing so.
The table layout s supports the existing RPC endpoints only. New RPC endpoints
in the future may require additions to the schema and potentially iterating over
all transactions to build up the necessary metadata.
## Accessing BigTable
BigTable has a gRPC endpoint that can be accessed using the
[tonic](https://crates.io/crates/crate)] and the raw protobuf API, as currently no
higher-level Rust crate for BigTable exists. Practically this makes parsing the
results of BigTable queries more complicated but is not a significant issue.
## Data Population
The ongoing population of instance data will occur on an epoch cadence through the
use of a new `solana-ledger-tool` command that will convert rocksdb data for a
given slot range into the instance schema.
The same process will be run once, manually, to backfill the existing ledger
data.
### Block Table: `block`
This table contains the compressed block data for a given slot.
The row key is generated by taking the 16 digit lower case hexadecimal
representation of the slot, to ensure that the oldest slot with a confirmed
block will always be first when the rows are listed. eg, The row key for slot
42 would be 000000000000002a.
The row data is a compressed `StoredConfirmedBlock` struct.
### Account Address Transaction Signature Lookup Table: `tx-by-addr`
This table contains the transactions that affect a given address.
The row key is `<base58
address>/<slot-id-one's-compliment-hex-slot-0-prefixed-to-16-digits>`. The row
data is a compressed `TransactionByAddrInfo` struct.
Taking the one's compliment of the slot allows for listing of slots ensures that
the newest slot with transactions that affect an address will always
be listed first.
Sysvar addresses are not indexed. However frequently used programs such as
Vote or System are, and will likely have a row for every confirmed slot.
### Transaction Signature Lookup Table: `tx`
This table maps a transaction signature to its confirmed block, and index within that block.
The row key is the base58-encoded transaction signature.
The row data is a compressed `TransactionInfo` struct.