Add cluster test framework doc. (#3189)
This commit is contained in:
committed by
Greg Fitzgerald
parent
0acdbc0d03
commit
5de38852d2
@@ -54,3 +54,4 @@
|
|||||||
- [Attack Vectors](ed_attack_vectors.md)
|
- [Attack Vectors](ed_attack_vectors.md)
|
||||||
- [References](ed_references.md)
|
- [References](ed_references.md)
|
||||||
- [Leader-to-Validator Transition](leader-validator-transition.md)
|
- [Leader-to-Validator Transition](leader-validator-transition.md)
|
||||||
|
- [Cluster Test Framework](cluster-test-framework.md)
|
||||||
|
122
book/src/cluster-test-framework.md
Normal file
122
book/src/cluster-test-framework.md
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
# Cluster Test Framework
|
||||||
|
|
||||||
|
This document proposes the Cluster Test Framework (CTF). CTF is a test harness
|
||||||
|
that allows tests to execute against a local, in-process cluster or a
|
||||||
|
deployed cluster.
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
|
||||||
|
The goal of CTF is to provide a framework for writing tests independent of where
|
||||||
|
and how the cluster is deployed. Regressions can be captured in these tests and
|
||||||
|
the tests can be run against deployed clusters to verify the deployment. The
|
||||||
|
focus of these tests should be on cluster stability, consensus, fault tolerance,
|
||||||
|
API stability.
|
||||||
|
|
||||||
|
Tests should verify a single bug or scenario, and should be written with the
|
||||||
|
least amount of internal plumbing exposed to the test.
|
||||||
|
|
||||||
|
## Design Overview
|
||||||
|
|
||||||
|
Tests are provided an entry point, which is a `contact_info::ContactInfo`
|
||||||
|
structure, and a keypair that has already been funded.
|
||||||
|
|
||||||
|
Each node in the cluster is configured with a `fullnode::FullnodeConfig` at boot
|
||||||
|
time. At boot time this configuration specifies any extra cluster configuration
|
||||||
|
required for the test. The cluster should boot with the configuration when it
|
||||||
|
is run in-process or in a data center.
|
||||||
|
|
||||||
|
Once booted, the test will discover the cluster through a gossip entry point and
|
||||||
|
configure any runtime behaviors via fullnode RPC.
|
||||||
|
|
||||||
|
## Test Interface
|
||||||
|
|
||||||
|
Each CTF test starts with an opaque entry point and a funded keypair. The test
|
||||||
|
should not depend on how the cluster is deployed, and should be able to exercise
|
||||||
|
all the cluster functionality through the publicly available interfaces.
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
use crate::contact_info::ContactInfo;
|
||||||
|
use solana_sdk::signature::{Keypair, KeypairUtil};
|
||||||
|
pub fn test_this_behavior(
|
||||||
|
entry_point_info: &ContactInfo,
|
||||||
|
funding_keypair: &Keypair,
|
||||||
|
num_nodes: usize,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Cluster Discovery
|
||||||
|
|
||||||
|
At test start, the cluster has already been established and is fully connected.
|
||||||
|
The test can discover most of the available nodes over a few second.
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
use crate::gossip_service::discover;
|
||||||
|
|
||||||
|
// Discover the cluster over a few seconds.
|
||||||
|
let cluster_nodes = discover(&entry_point_info, num_nodes);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cluster Configuration
|
||||||
|
|
||||||
|
To enable specific scenarios, the cluster needs to be booted with special
|
||||||
|
configurations. These configurations can be captured in
|
||||||
|
`fullnode::FullnodeConfig`.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
let mut fullnode_config = FullnodeConfig::default();
|
||||||
|
fullnode_config.rpc_config.enable_fullnode_exit = true;
|
||||||
|
let local = LocalCluster::new_with_config(
|
||||||
|
num_nodes,
|
||||||
|
10_000,
|
||||||
|
100,
|
||||||
|
&fullnode_config
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
## How to design a new test
|
||||||
|
|
||||||
|
For example, there is a bug that shows that the cluster fails when it is flooded
|
||||||
|
with invalid advertised gossip nodes. Our gossip library and protocol may
|
||||||
|
change, but the cluster still needs to stay resilient to floods of invalid
|
||||||
|
advertised gossip nodes.
|
||||||
|
|
||||||
|
Configure the RPC service:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
let mut fullnode_config = FullnodeConfig::default();
|
||||||
|
fullnode_config.rpc_config.enable_rpc_gossip_push = true;
|
||||||
|
fullnode_config.rpc_config.enable_rpc_gossip_refresh_active_set = true;
|
||||||
|
```
|
||||||
|
|
||||||
|
Wire the RPCs and write a new test:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub fn test_large_invalid_gossip_nodes(
|
||||||
|
entry_point_info: &ContactInfo,
|
||||||
|
funding_keypair: &Keypair,
|
||||||
|
num_nodes: usize,
|
||||||
|
) {
|
||||||
|
let cluster = discover(&entry_point_info, num_nodes);
|
||||||
|
|
||||||
|
// Poison the cluster.
|
||||||
|
let mut client = mk_client(&entry_point_info);
|
||||||
|
for _ in 0..(num_nodes * 100) {
|
||||||
|
client.gossip_push(
|
||||||
|
cluster_info::invalid_contact_info()
|
||||||
|
);
|
||||||
|
}
|
||||||
|
sleep(Durration::from_millis(1000));
|
||||||
|
|
||||||
|
// Force refresh of the active set.
|
||||||
|
for node in &cluster {
|
||||||
|
let mut client = mk_client(&node);
|
||||||
|
client.gossip_refresh_active_set();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify that spends still work.
|
||||||
|
verify_spends(&cluster);
|
||||||
|
}
|
||||||
|
```
|
Reference in New Issue
Block a user