Add proposal for transactions v2 and address map program (#17103) (#18376)

* Add proposal for supporting big transactions * account index program * fix formatting * review feedback * Add cost changes section * Add cost section and more attack details * fix lint * document metadata changes * nit * rpc details * add index meta struct * add additional proposal and chagne title * rename proposal file * rename to address map and rewrite tx format * no more appends, limit mapping size to 256 * update dos section * add note about readonly * restructure message to use enum * cleanup (cherry picked from commit 191519188d) Co-authored-by: Justin Starry <justin@solana.com>
2021-07-02 02:39:44 +00:00
parent 0200740d70
commit eeaf0234f0
1 changed files with 302 additions and 0 deletions
--- a/docs/src/proposals/transactions-v2.md
+++ b/docs/src/proposals/transactions-v2.md
@@ -0,0 +1,302 @@
 # Transactions v2 - Address maps
 ## Problem
 Messages transmitted to Solana validators must not exceed the IPv6 MTU size to
 ensure fast and reliable network transmission of cluster info over UDP.
 Solana's networking stack uses a conservative MTU size of 1280 bytes which,
 after accounting for headers, leaves 1232 bytes for packet data like serialized
 transactions.
 Developers building applications on Solana must design their on-chain program
 interfaces within the above transaction size limit constraint. One common
 work-around is to store state temporarily on-chain and consume that state in
 later transactions. This is the approach used by the BPF loader program for
 deploying Solana programs.
 However, this workaround doesn't work well when developers compose many on-chain
 programs in a single atomic transaction. With more composition comes more
 account inputs, each of which takes up 32 bytes. There is currently no available
 workaround for increasing the number of accounts used in a single transaction
 since each transaction must list all accounts that it needs to properly lock
 accounts for parallel execution. Therefore the current cap is about 35 accounts
 after accounting for signatures and other transaction metadata.
 ## Proposed Solution
 Introduce a new on-chain program which stores account address maps and add a new
 transaction format which supports concise account references through the
 on-chain address maps.
 ### Address Map Program
 Here we describe a program-based solution to the problem, whereby a protocol
 developer or end-user can create collections of related addresses on-chain for
 concise use in a transaction's account inputs. This approach is similar to page
 tables used in operating systems to succinctly map virtual addresses to physical
 memory.
 After addresses are stored on-chain in an address map account, they may be
 succinctly referenced in a transaction using a 1-byte u8 index rather than a
 full 32-byte address. This will require a new transaction format to make use of
 these succinct references as well as runtime handling for looking up and loading
 accounts from the on-chain mappings.
 #### State
 Address map accounts must be rent-exempt but may be closed with a one epoch
 deactivation period. Address maps must be activated before use.
 Since transactions use a u8 offset to look up mapped addresses, accounts can
 store up to 2^8 addresses each. Anyone may create an address map account of any
 size as long as its big enough to store the necessary metadata. In addition to
 stored addresses, address map accounts must also track the latest count of
 stored addresses and an authority which must be a present signer for all
 appended map entries.
 Map additions require one slot to activate so each map should track how many
 addresses are still pending activation in their on-chain state:
 ```rust
 struct AddressMap {
  // authority must sign for each addition and to close the map account
  authority: Pubkey,
  // record a deactivation epoch to help validators know when to remove
  // the map from their caches.
  deactivation_epoch: Epoch,
  // entries may not be modified once activated
  activated: bool,
  // list of entries, max capacity of u8::MAX
  entries: Vec<Pubkey>,
 }
 ```
 #### Cleanup
 Once an address map gets stale and is no longer used, it can be reclaimed by the
 authority withdrawing lamports but the remaining balance must be greater than
 two epochs of rent. This ensures that it takes at least one full epoch to
 deactivate a map.
 Maps may not be recreated because each new map must be created at a derived
 address using a monotonically increasing counter as a derivation seed.
 #### Cost
 Since address map accounts require caching and special handling in the runtime,
 they should incur higher costs for storage. Cost structure design will be added
 later.
 ### Versioned Transactions
 In order to allow accounts to be referenced more succinctly, the structure of
 serialized transactions must be modified. The new transaction format should not
 affect transaction processing in the Solana VM beyond the increased capacity for
 accounts and program invocations. Invoked programs will be unaware of which
 transaction format was used.
 The new transaction format must be distinguished from the current transaction
 format. Current transactions can fit at most 19 signatures (64-bytes each) but
 the message header encodes `num_required_signatures` as a `u8`. Since the upper
 bit of the `u8` will never be set for a valid transaction, we can enable it to
 denote whether a transaction should be decoded with the versioned format or not.
 #### New Transaction Format
 ```rust
 #[derive(Serialize, Deserialize)]
 pub struct Transaction {
    #[serde(with = "short_vec")]
    pub signatures: Vec<Signature>,
    /// The message to sign.
    pub message: Message,
 }
 // Uses custom serialization. If the first bit is set, a versioned message is
 // encoded starting from the next byte. If the first bit is not set, all bytes
 // are used to encode the original unversioned `Message` format.
 pub enum Message {
  Unversioned(UnversionedMessage),
  Versioned(VersionedMessage),
 }
 // use bincode varint encoding to use u8 instead of u32 for enum tags
 #[derive(Serialize, Deserialize)]
 pub enum VersionedMessage {
  Current(Box<MessageV2>)
 }
 #[derive(Serialize, Deserialize)]
 pub struct MessageV2 {
  // unchanged
  pub header: MessageHeader,
  // unchanged
  #[serde(with = "short_vec")]
  pub account_keys: Vec<Pubkey>,
  /// The last `address_maps.len()` number of readonly unsigned account_keys
  /// should be loaded as address maps
  #[serde(with = "short_vec")]
  pub address_maps: Vec<AddressMap>,
  // unchanged
  pub recent_blockhash: Hash,
  // unchanged. Account indices are still `u8` encoded so the max number of accounts
  // in account_keys + address_maps is limited to 256.
  #[serde(with = "short_vec")]
  pub instructions: Vec<CompiledInstruction>,
 }
 #[derive(Serialize, Deserialize)]
 pub struct AddressMap {
  /// The last num_readonly_entries of entries are read-only
  pub num_readonly_entries: u8,
  /// List of map entries to load
  #[serde(with = "short_vec")]
  pub entries: Vec<u8>,
 }
 ```
 #### Size changes
 - 1 byte for `prefix` field
 - 1 byte for version enum discriminant
 - 1 byte for `address_maps` length
 - Each map requires 2 bytes for `entries` length and `num_readonly`
 - Each map entry is 1 byte (u8)
 #### Cost changes
 Using an address map in a transaction should incur an extra cost due to
 the extra work validators need to do to load and cache them.
 #### Metadata changes
 Each account accessed via an address map should be stored in the transaction
 metadata for quick reference. This will avoid the need for clients to make
 multiple RPC round trips to fetch all accounts referenced in a v2 transaction.
 It will also make it easier to use the ledger tool to analyze account access
 patterns.
 #### RPC changes
 Fetched transaction responses will likely require a new version field to
 indicate to clients which transaction structure to use for deserialization.
 Clients using pre-existing RPC methods will receive error responses when
 attempting to fetch a versioned transaction which will indicate that they
 must upgrade.
 The RPC API should also support an option for returning fully expanded
 transactions to abstract away the address map details from downstream clients.
 ### Limitations
 - Max of 256 accounts may be specified in a transaction because u8 is used by compiled
 instructions to index into transaction message account keys.
 - Address maps can hold up to 256 addresses because references to map entries
 are encoded as `u8` in transactions.
 - Transaction signers may not be referenced with an address map, the full
 address of each signer must be serialized in the transaction. This ensures that
 the performance of transaction signature checks is not affected.
 - Hardware wallets will probably not be able to display details about accounts
 referenced through address maps due to inability to verify on-chain data.
 - Only single level address maps can be used. Recursive maps will not be supported.
 ## Security Concerns
 ### Resource consumption
 Enabling more account inputs in a transaction allows for more program
 invocations, write-locks, and data reads / writes. Before address maps are
 enabled, transaction-wide compute limits and increased costs for write locks and
 data reads are required.
 ### Front running
 If the addresses listed within an address map account are modifiable, front
 running attacks could modify which mapped accounts are resolved for a later
 transaction. For this reason, we propose that any stored address is immutable
 and that address map accounts themselves may not be recreated.
 Additionally, a malicious actor could try to fork the chain immediately after a
 new address map account is added to a block. If successful, they could add a
 different unexpected map entry in the fork. In order to deter this attack,
 clients should wait for address maps to be finalized before using them in a
 transaction.  Clients may also append integrity check instructions to the
 transaction which verify that the correct accounts are used.
 ### Denial of service
 Address map accounts will be read very frequently and will therefore be a
 more high profile target for denial of service attacks through write locks
 similar to sysvar accounts.
 For this reason, special handling should be given to address map lookups.
 Address maps lookups should not be affected by account read/write locks.
 ### Duplicate accounts
 Transactions may not load an account more than once whether directly through
 `account_keys` or indirectly through `address_maps`.
 ## Other Proposals
 1) Account prefixes
 Needing to pre-register accounts in an on-chain address map is cumbersome
 because it adds an extra step for transaction processing. Instead, Solana
 transactions could use variable length address prefixes to specify accounts.
 These prefix shortcuts can save on data usage without needing to setup on-chain
 state.
 However, this model requires nodes to keep a mapping of prefixes to active account
 addresses. Attackers can create accounts with the same prefix as a popular account
 to disrupt transactions.
 2) Transaction builder program
 Solana can provide a new on-chain program which allows "Big" transactions to be
 constructed on-chain by normal transactions. Once the transaction is
 constructed, a final "Execute" transaction can trigger a node to process the big
 transaction as a normal transaction without needing to fit it into an MTU sized
 packet.
 The UX of this approach is tricky. A user could in theory sign a big transaction
 but it wouldn't be great if they had to use their wallet to sign multiple
 transactions to build that transaction that they already signed and approved. This
 could be a use-case for transaction relay services, though. A user could pay a
 relayer to construct the large pre-signed transaction on-chain for them.
 In order to prevent the large transaction from being reconstructed and replayed,
 its message hash will need to be added to the status cache when executed.
 3) Epoch account indexes
 Similarly to leader schedule calculation, validators could create a global index
 of the most accessed accounts in the previous epoch and make that index
 available to transactions in the following epoch.
 This approach has a downside of only updating the index at epoch boundaries
 which means there would be a few day delay before popular new accounts could be
 referenced. It also needs to be consistently generated by all validators by
 using some criteria like adding accounts in order by access count.
 4) Address lists
 Extend the transaction structure to support addresses that, when loaded, expand
 to a list of addresses. After expansion, all account inputs are concatenated to
 form a single list of account keys which can be indexed into by instructions.
 Address lists would likely need to be immutable to prevent attacks. They would
 also need to be limited in length to limit resource consumption.
 This proposal can be thought of a special case of the proposed index account
 approach. Since the full account list would be expanded, there's no need to add
 additional offsets that use up the limited space in a serialized transaction.
 However, the expected size of an address list may need to be encoded into the
 transaction to aid the sanitization of account indexes.  We would also need to
 encode how many addresses in the list should be loaded as readonly vs
 read-write. Lastly, special attention must be given to watch out for addresses
 that exist in multiple account lists.