solana/docs/src/proposals/cluster-test-framework.md

# Cluster Test Framework

This document proposes the Cluster Test Framework \(CTF\). CTF is a test harness that allows tests to execute against a local, in-process cluster or a deployed cluster.

## Motivation

The goal of CTF is to provide a framework for writing tests independent of where and how the cluster is deployed. Regressions can be captured in these tests and the tests can be run against deployed clusters to verify the deployment. The focus of these tests should be on cluster stability, consensus, fault tolerance, API stability.

Tests should verify a single bug or scenario, and should be written with the least amount of internal plumbing exposed to the test.

## Design Overview

Tests are provided an entry point, which is a `contact_info::ContactInfo` structure, and a keypair that has already been funded.

Each node in the cluster is configured with a `validator::ValidatorConfig` at boot time. At boot time this configuration specifies any extra cluster configuration required for the test. The cluster should boot with the configuration when it is run in-process or in a data center.

Once booted, the test will discover the cluster through a gossip entry point and configure any runtime behaviors via validator RPC.

## Test Interface

Each CTF test starts with an opaque entry point and a funded keypair. The test should not depend on how the cluster is deployed, and should be able to exercise all the cluster functionality through the publicly available interfaces.

```text
use crate::contact_info::ContactInfo;
use solana_sdk::signature::{Keypair, Signer};
pub fn test_this_behavior(
    entry_point_info: &ContactInfo,
    funding_keypair: &Keypair,
    num_nodes: usize,
)
```

## Cluster Discovery

At test start, the cluster has already been established and is fully connected. The test can discover most of the available nodes over a few second.

```text
use crate::gossip_service::discover_nodes;

// Discover the cluster over a few seconds.
let cluster_nodes = discover_nodes(&entry_point_info, num_nodes);
```

## Cluster Configuration

To enable specific scenarios, the cluster needs to be booted with special configurations. These configurations can be captured in `validator::ValidatorConfig`.

For example:

```text
let mut validator_config = ValidatorConfig::default();
validator_config.rpc_config.enable_validator_exit = true;
let local = LocalCluster::new_with_config(
                num_nodes,
                10_000,
                100,
                &validator_config
                );
```

## How to design a new test

For example, there is a bug that shows that the cluster fails when it is flooded with invalid advertised gossip nodes. Our gossip library and protocol may change, but the cluster still needs to stay resilient to floods of invalid advertised gossip nodes.

Configure the RPC service:

```text
let mut validator_config = ValidatorConfig::default();
validator_config.rpc_config.enable_rpc_gossip_push = true;
validator_config.rpc_config.enable_rpc_gossip_refresh_active_set = true;
```

Wire the RPCs and write a new test:

```text
pub fn test_large_invalid_gossip_nodes(
    entry_point_info: &ContactInfo,
    funding_keypair: &Keypair,
    num_nodes: usize,
) {
    let cluster = discover_nodes(&entry_point_info, num_nodes);

    // Poison the cluster.
    let client = create_client(entry_point_info.client_facing_addr(), VALIDATOR_PORT_RANGE);
    for _ in 0..(num_nodes * 100) {
        client.gossip_push(
            cluster_info::invalid_contact_info()
        );
    }
    sleep(Durration::from_millis(1000));

    // Force refresh of the active set.
    for node in &cluster {
        let client = create_client(node.client_facing_addr(), VALIDATOR_PORT_RANGE);
        client.gossip_refresh_active_set();
    }

    // Verify that spends still work.
    verify_spends(&cluster);
}
```
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`# Cluster Test Framework`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			`This document proposes the Cluster Test Framework \(CTF\). CTF is a test harness that allows tests to execute against a local, in-process cluster or a deployed cluster.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`## Motivation`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			`The goal of CTF is to provide a framework for writing tests independent of where and how the cluster is deployed. Regressions can be captured in these tests and the tests can be run against deployed clusters to verify the deployment. The focus of these tests should be on cluster stability, consensus, fault tolerance, API stability.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			`Tests should verify a single bug or scenario, and should be written with the least amount of internal plumbing exposed to the test.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`## Design Overview`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			Tests are provided an entry point, which is a `contact_info::ContactInfo` structure, and a keypair that has already been funded.
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
Remove many uses of legacy term 'fullnode' (#6324) 2019-10-10 17:33:00 -06:00			Each node in the cluster is configured with a `validator::ValidatorConfig` at boot time. At boot time this configuration specifies any extra cluster configuration required for the test. The cluster should boot with the configuration when it is run in-process or in a data center.
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
Remove many uses of legacy term 'fullnode' (#6324) 2019-10-10 17:33:00 -06:00			`Once booted, the test will discover the cluster through a gossip entry point and configure any runtime behaviors via validator RPC.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`## Test Interface`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			`Each CTF test starts with an opaque entry point and a funded keypair. The test should not depend on how the cluster is deployed, and should be able to exercise all the cluster functionality through the publicly available interfaces.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			```text
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`use crate::contact_info::ContactInfo;`
Rename KeypairUtil to Signer (#8360) automerge 2020-02-20 14:28:55 -07:00			`use solana_sdk::signature::{Keypair, Signer};`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`pub fn test_this_behavior(`
			`entry_point_info: &ContactInfo,`
			`funding_keypair: &Keypair,`
			`num_nodes: usize,`
			`)`
			```

			`## Cluster Discovery`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			`At test start, the cluster has already been established and is fully connected. The test can discover most of the available nodes over a few second.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			```text
Rework discover to handle additional parameters, and be unit-testable 2019-04-01 17:11:42 -06:00			`use crate::gossip_service::discover_nodes;`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`// Discover the cluster over a few seconds.`
Rework discover to handle additional parameters, and be unit-testable 2019-04-01 17:11:42 -06:00			`let cluster_nodes = discover_nodes(&entry_point_info, num_nodes);`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			```

			`## Cluster Configuration`

Remove many uses of legacy term 'fullnode' (#6324) 2019-10-10 17:33:00 -06:00			To enable specific scenarios, the cluster needs to be booted with special configurations. These configurations can be captured in `validator::ValidatorConfig`.
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`For example:`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			```text
More fullnode -> validator renaming (#4414) * s/fullnode_config/validator_config/g * s/FullnodeConfig/ValidatorConfig/g * mv core/lib/fullnode.rs core/lib/validator.rs * s/Fullnode/Validator/g * Add replicator-x.sh * Rename fullnode.md to validator.md * cargo fmt 2019-05-23 22:05:16 -07:00			`let mut validator_config = ValidatorConfig::default();`
Remove many uses of legacy term 'fullnode' (#6324) 2019-10-10 17:33:00 -06:00			`validator_config.rpc_config.enable_validator_exit = true;`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`let local = LocalCluster::new_with_config(`
			`num_nodes,`
			`10_000,`
			`100,`
More fullnode -> validator renaming (#4414) * s/fullnode_config/validator_config/g * s/FullnodeConfig/ValidatorConfig/g * mv core/lib/fullnode.rs core/lib/validator.rs * s/Fullnode/Validator/g * Add replicator-x.sh * Rename fullnode.md to validator.md * cargo fmt 2019-05-23 22:05:16 -07:00			`&validator_config`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`);`
			```

			`## How to design a new test`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			`For example, there is a bug that shows that the cluster fails when it is flooded with invalid advertised gossip nodes. Our gossip library and protocol may change, but the cluster still needs to stay resilient to floods of invalid advertised gossip nodes.`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`Configure the RPC service:`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			```text
More fullnode -> validator renaming (#4414) * s/fullnode_config/validator_config/g * s/FullnodeConfig/ValidatorConfig/g * mv core/lib/fullnode.rs core/lib/validator.rs * s/Fullnode/Validator/g * Add replicator-x.sh * Rename fullnode.md to validator.md * cargo fmt 2019-05-23 22:05:16 -07:00			`let mut validator_config = ValidatorConfig::default();`
			`validator_config.rpc_config.enable_rpc_gossip_push = true;`
			`validator_config.rpc_config.enable_rpc_gossip_refresh_active_set = true;`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			```

			`Wire the RPCs and write a new test:`

GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00			```text
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`pub fn test_large_invalid_gossip_nodes(`
			`entry_point_info: &ContactInfo,`
			`funding_keypair: &Keypair,`
			`num_nodes: usize,`
			`) {`
Rework discover to handle additional parameters, and be unit-testable 2019-04-01 17:11:42 -06:00			`let cluster = discover_nodes(&entry_point_info, num_nodes);`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00
			`// Poison the cluster.`
Remove many uses of legacy term 'fullnode' (#6324) 2019-10-10 17:33:00 -06:00			`let client = create_client(entry_point_info.client_facing_addr(), VALIDATOR_PORT_RANGE);`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`for _ in 0..(num_nodes * 100) {`
			`client.gossip_push(`
			`cluster_info::invalid_contact_info()`
			`);`
			`}`
			`sleep(Durration::from_millis(1000));`

			`// Force refresh of the active set.`
			`for node in &cluster {`
Remove many uses of legacy term 'fullnode' (#6324) 2019-10-10 17:33:00 -06:00			`let client = create_client(node.client_facing_addr(), VALIDATOR_PORT_RANGE);`
Add cluster test framework doc. (#3189) 2019-03-08 18:29:41 -08:00			`client.gossip_refresh_active_set();`
			`}`

			`// Verify that spends still work.`
			`verify_spends(&cluster);`
			`}`
			```
GitBook: [master] 156 pages and 12 assets modified 2019-09-23 03:38:34 +00:00