Add RPC transaction history design
This commit is contained in:
		
							
								
								
									
										106
									
								
								docs/src/implemented-proposals/rpc-transaction-history.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								docs/src/implemented-proposals/rpc-transaction-history.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,106 @@
 | 
				
			|||||||
 | 
					# Long term RPC Transaction History
 | 
				
			||||||
 | 
					There's a need for RPC to serve at least 6 months of transaction history.  The
 | 
				
			||||||
 | 
					current history, on the order of days, is insufficient for downstream users.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6 months of transaction data cannot be stored practically in a validator's
 | 
				
			||||||
 | 
					rocksdb ledger so an external data store is necessary.   The validator's
 | 
				
			||||||
 | 
					rocksdb ledger will continue to serve as the primary data source, and then will
 | 
				
			||||||
 | 
					fall back to the external data store.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The affected RPC endpoints are:
 | 
				
			||||||
 | 
					* [getFirstAvailableBlock](https://docs.solana.com/apps/jsonrpc-api#getfirstavailableblock)
 | 
				
			||||||
 | 
					* [getConfirmedBlock](https://docs.solana.com/apps/jsonrpc-api#getconfirmedblock)
 | 
				
			||||||
 | 
					* [getConfirmedBlocks](https://docs.solana.com/apps/jsonrpc-api#getconfirmedblocks)
 | 
				
			||||||
 | 
					* [getConfirmedSignaturesForAddress](https://docs.solana.com/apps/jsonrpc-api#getconfirmedsignaturesforaddress)
 | 
				
			||||||
 | 
					* [getConfirmedTransaction](https://docs.solana.com/apps/jsonrpc-api#getconfirmedtransaction)
 | 
				
			||||||
 | 
					* [getSignatureStatuses](https://docs.solana.com/apps/jsonrpc-api#getsignaturestatuses)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Note that [getBlockTime](https://docs.solana.com/apps/jsonrpc-api#getblocktime)
 | 
				
			||||||
 | 
					is not supported, as once https://github.com/solana-labs/solana/issues/10089 is
 | 
				
			||||||
 | 
					fixed then `getBlockTime` can be removed.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Some system design constraints:
 | 
				
			||||||
 | 
					* The volume of data to store and search can quickly jump into the terabytes,
 | 
				
			||||||
 | 
					  and is immutable.
 | 
				
			||||||
 | 
					* The system should be as light as possible for SREs.  For example an SQL
 | 
				
			||||||
 | 
					  database cluster that requires an SRE to continually monitor and rebalance
 | 
				
			||||||
 | 
					  nodes is undesirable.
 | 
				
			||||||
 | 
					* Data must be searchable in real time - batched queries that take minutes or
 | 
				
			||||||
 | 
					  hours to run are unacceptable.
 | 
				
			||||||
 | 
					* Easy to replicate the data worldwide to co-locate it with the RPC endpoints
 | 
				
			||||||
 | 
					  that will utilize it.
 | 
				
			||||||
 | 
					* Interfacing with the external data store should be easy and not require
 | 
				
			||||||
 | 
					  depending on risky lightly-used community-supported code libraries
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Based on these constraints, Google's BigTable product is selected as the data
 | 
				
			||||||
 | 
					store.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Table Schema
 | 
				
			||||||
 | 
					A BigTable instance is used to hold all transaction data, broken up into
 | 
				
			||||||
 | 
					different tables for quick searching.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					New data may be copied into the instance at anytime without affecting the existing
 | 
				
			||||||
 | 
					data, and all data is immutable.  Generally the expectation is that new data
 | 
				
			||||||
 | 
					will be uploaded once an current epoch completes but there is no limitation on
 | 
				
			||||||
 | 
					the frequency of data dumps.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Cleanup of old data is automatic by configuring the data retention policy of the
 | 
				
			||||||
 | 
					instance tables appropriately, it just disappears.  Therefore the order of when data is
 | 
				
			||||||
 | 
					added becomes important.  For example if data from epoch N-1 is added after data
 | 
				
			||||||
 | 
					from epoch N, the older epoch data will outlive the newer data.  However beyond
 | 
				
			||||||
 | 
					producing _holes_ in query results, this kind of unordered deletion will
 | 
				
			||||||
 | 
					have no ill effect.  Note that this method of cleanup effectively allows for an
 | 
				
			||||||
 | 
					unlimited amount of transaction data to be stored, restricted only by the
 | 
				
			||||||
 | 
					monetary costs of doing so.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The table layout s supports the existing RPC endpoints only.  New RPC endpoints
 | 
				
			||||||
 | 
					in the future may require additions to the schema and potentially iterating over
 | 
				
			||||||
 | 
					all transactions to build up the necessary metadata.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Accessing BigTable
 | 
				
			||||||
 | 
					BigTable has a gRPC endpoint that can be accessed using the
 | 
				
			||||||
 | 
					[tonic](https://crates.io/crates/crate)] and the raw protobuf API, as currently no
 | 
				
			||||||
 | 
					higher-level Rust crate for BigTable exists.  Practically this makes parsing the
 | 
				
			||||||
 | 
					results of BigTable queries more complicated but is not a significant issue.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Data Population
 | 
				
			||||||
 | 
					The ongoing population of instance data will occur on an epoch cadence through the
 | 
				
			||||||
 | 
					use of a new `solana-ledger-tool` command that will convert rocksdb data for a
 | 
				
			||||||
 | 
					given slot range into the instance schema.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The same process will be run once, manually, to backfill the existing ledger
 | 
				
			||||||
 | 
					data.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Block Table: `block`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This table contains the compressed block data for a given slot.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The row key is generated by taking the 16 digit lower case hexadecimal
 | 
				
			||||||
 | 
					representation of the slot, to ensure that the oldest slot with a confirmed
 | 
				
			||||||
 | 
					block will always be first when the rows are listed.  eg, The row key for slot
 | 
				
			||||||
 | 
					42 would be 000000000000002a.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The row data is a compressed `StoredConfirmedBlock` struct.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Account Address Transaction Signature Lookup Table: `tx-by-addr`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This table contains the transactions that affect a given address.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The row key is `<base58
 | 
				
			||||||
 | 
					address>/<slot-id-one's-compliment-hex-slot-0-prefixed-to-16-digits>`.  The row
 | 
				
			||||||
 | 
					data is a compressed `TransactionByAddrInfo` struct.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Taking the one's compliment of the slot allows for listing of slots ensures that
 | 
				
			||||||
 | 
					the newest slot with transactions that affect an address will always
 | 
				
			||||||
 | 
					be listed first.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Sysvar addresses are not indexed.  However frequently used programs such as
 | 
				
			||||||
 | 
					Vote or System are, and will likely have a row for every confirmed slot.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Transaction Signature Lookup Table: `tx`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This table maps a transaction signature to its confirmed block, and index within that block.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The row key is the base58-encoded transaction signature.
 | 
				
			||||||
 | 
					The row data is a compressed `TransactionInfo` struct.
 | 
				
			||||||
		Reference in New Issue
	
	Block a user