* Add Vote Contract
* Move ownership of LeaderScheduler from Fullnode to the bank
* Modified ReplicateStage to consume leader information from bank
* Restart RPC Services in Leader To Validator Transition
* Make VoteContract Context Free
* Remove voting from ClusterInfo and Tpu
* Remove dependency on ActiveValidators in LeaderScheduler
* Switch VoteContract to have two steps 1) Register 2) Vote. Change thin client to create + register a voting account on fullnode startup
* Remove check in leader_to_validator transition for unique references to bank, b/c jsonrpc service and rpcpubsub hold references through jsonhttpserver
* Use IV to make unique identies
* Use hex! macro for hex literal and not string converted to u8 slice
* fix sha sampling to control init/end of sha state
* Move ledger write to its own stage
- Also, rename write_stage to leader_vote_stage, as write functionality
is moved to a different stage
* Address review comments
* Fix leader rotation test failure
* address review comments
Addresses the following problem
- Validators are not able to keep up with the leader
- The future blobs (outside of window) get dropped
- The validators won't process repair requests for these future blobs
* Add PoH height to process_ledger()
* Moved broadcast_stage Leader Scheduling logic to use Poh height instead of entry_height
* Moved LeaderScheduler logic to PoH in ReplicateStage
* Fix Leader scheduling tests to use PoH instead of entry height
* Change is_leader detection in repair() to use PoH instead of entry height
* Add tests to LeaderScheduler for new functionality
* fix Entry::new and genesis block PoH counts
* Moved LeaderScheduler to PoH ticks
* Cleanup to resolve PR comments
Budget now assumes the source account holds all tokens the program
should spend.
Note: the static guarantees implied by verify_plan() are meaningless
under the new contract engine. The bank no longer calls it. This
serves as a nice example of where comparing code coverage between
integration tests and unit tests would have shown us where a
change rendered unit tests meaningless.
Debits no longer need to be applied before credits. Instead, we
lock any accounts we'd debit and so error out on the second attempt
to lock the same account.
In the old bank (before the contract engine), Contract wasn't specific
to Budget. It provided the same service as what is now called
SystemProgram::Move, but without requiring a separate account.
- Testnet dashboard shows that channel pressure for write stage
is incrementing on every iteration of write.
- This change optimizes ledger writing by removing cloning of map
and reducing calls to flush
The integration tests are allowed to open sockets, so running them
in parallel may cause "Too many open files" errors. This patch
runs the unit tests in parallel and the integration test serially.
* Escape RUST_LOG configuration in remote-node.sh
- If it was set to #, it was causing other parameters to be commented out
* escape other variables as well
* disabled shell check
* Fix shellcheck error
* Upload bench output as build artifacts
* Fix tags types
* Pull previous stats from metrics
* Change the default branch for comparison
* Fix formatting
* Fix build errors
* Address review comments
* Dedup some common code
* Add eval for channel info to find branch name
Generate tick entry ids and only register ticks as the last_id expected by the bank. Since the bank is MT, the in-flight pipeline of transactions cannot be close to the end of the queue or there is a high possibility that a starved thread will encode an expired last_id into the ledger. The banking_stage therefore uses a shorter age limit for encoded last_ids then the validators.
Bench client doesn't send transactions that are older then 30 seconds.
* Added LeaderScheduler module and tests
* plumbing for LeaderScheduler in Fullnode + tests. Add vote processing for active set to ReplicateStage and WriteStage
* Add LeaderScheduler plumbing for Tvu, window, and tests
* Fix bank and switch tests to use new LeaderScheduler
* move leader rotation check from window service to replicate stage
* Add replicate_stage leader rotation exit test
* removed leader scheduler from the window service and associated modules/tests
* Corrected is_leader calculation in repair() function in window.rs
* Integrate LeaderScheduler with write_stage for leader to validator transitions
* Integrated LeaderScheduler with BroadcastStage
* Removed gossip leader rotation from crdt
* Add multi validator, leader test
* Comments and cleanup
* Remove unneeded checks from broadcast stage
* Fix case where a validator/leader need to immediately transition on startup after reading ledger and seeing they are not in the correct role
* Set new leader in validator -> validator transitions
* Clean up for PR comments, refactor LeaderScheduler from process_entry/process_ledger_tail
* Cleaned out LeaderScheduler options, implemented LeaderScheduler strategy that only picks the bootstrap leader to support existing tests, drone/airdrops
* Ignore test_full_leader_validator_network test due to bug where the next leader in line fails to get the last entry before rotation (b/c it hasn't started up yet). Added a test test_dropped_handoff_recovery go track this bug
* Change format of data for TPS/Finality metrics in testnet automation
* Revert number of nodes for testnet automation
* Split python command to its own script
* Fix python command line arguments
lua_State is not preserved across runs and account userdata is not converted into
Lua values. All this allows us to do is manipulate the number of tokens
in each account and DoS the Fullnode with those three little words,
"repeat until false".
Why bother? Research. rlua's project goals are well-aligned with the LAMPORT runtime.
What's next:
* rlua to add security limits, such as number of instructions executed
* Add a way to deserialize Account::userdata OR use Account::program_id
to look up a metatable for lua_newuserdata().
* Add support for different node counts
* Update variable names
* Delete network even after failures
* Add array for node counts
* Changed number of nodes to a space separated string of numbers
* Adjust number of nodes
* Snap will not be published if the env variable DO_NOT_PUBLISH_SNAP is set
* Address review comments
* Replaced influx db URL
Uploads two reports to Buildkite, one from cargo-cov and one from lcov via grcov. The lcov one is busted on linux and is what we need to bring codecov.io back up again. It works great on macos if you wanted to generate them locally and prefer lcov HTML reports.
* Also comment out non-coverage build to speed things up.
This reduces how much memory is written to last_id_sigs table on very TX, and has a 40% impact on
`cargo +nightly watch -x 'bench bench_banking_stage'`
See #1157 for details. The `from` account should be cloned
before execute_transaction(), and that's the only one that should
be stored if there's an error executing the program.
* Add check in window_service to exit in checks for leader rotation, and propagate that service exit up to fullnode
* Added logic to shutdown Tvu once ReplicateStage finishes
* Added test for successfully shutting down validator and starting up leader
* Add test for leader validator interaction
* fix streamer to check for exit signal before checking socket again to prevent busy leaders from never returning
* PR comments - Rewrite make_consecutive_blobs() function, revert genesis function change
lastidnotfound step 2:
* move "record stage", aka poh_service into banking stage
* remove Entry.has_more, is incompatible with leader rotation
* rewrite entry_next_hash in terms of Poh
* simplify and unify transaction hashing (no embedded nulls)
* register_last_entry from banking stage, fixes#1171 (w00t!)
* new PoH doesn't generate empty ledger entries, so some fixes necessary in
multinode tests that rely on that (e.g. giving validators airdrops)
* make window repair less patient, if we've been waiting for an answer,
don't be shy about most recent blobs
* delete recorder and record stage
* make more verbost thin_client error reporting
* more tracing in window (sigh)
* Add hooks for executing the storage contract
* Add store_ledger stage
Similar to replicate_stage but no voting/banking stuff, just convert
blobs to entries and write the ledger out
* Add storage_addr to tests and add new NodeInfo constructor
to reduce duplication...
* Move recycler instances to the point of allocation
* sinks no longer need to call `recycle`
* Remove the recycler arguments from all the apis that no longer need them
- The write stage will output vector of entries
- Broadcast stage will create blobs out of the entries
- Helps reduce MIPS requirements for write stage
* Move register_entry_id() call out of write stage
- Write stage is MIPS intensive and has become a bottleneck for
TPU pipeline
- This will reduce the MIPS requirements for the stage
* Fix rust format issues
* budget and system contracts and verification
* contract check_id methods
* system call contract
* verify contract execution rules
* move system into its own file
* allocate before transfer for budget
* store error in budget context
* budget contract and tests without bank
* moved budget of of bank
* Tweak GCE scripts for higher node count
- Some validators were unable to rsync config from leader when
the node count was high (e.g. 25). Looks like the leader node was
getting more rsync requests in parallel than it count handle.
- This change staggers the validators bootup, and rsync time
* Address review comments
* Use multiple sockets for receiving blobs on validators
- The blobs that are broadcasted by leader or retransmitted by peer
validators are received on replicate_port
- Using reuse_addr/reuse_port, multiple sockets can be opened for
the same port
- This allows the kernel to queue data to user space app on multiple
socket queues, preventing over-running one queue
- This helps with reducing packets dropped due to queue over-runs
Fixes#1224
* Fixed failing tests
* Implemented recvmmsg() for UDP packets
- This change implements binding between libc API for recvmmsg()
- The function can receive multiple packets using one system call
Fixes#1141
* Added unit tests for recvmmsg()
* Added recv_mmsg() wrapper for non Linux OS
* Address review comments for recvmmsg()
* Remove unnecessary imports
* Moved target specific dependencies to the function
In the error case that i>0 (we have blobs to send)
we break out of the loop and do not push the allocated r
to the v array. We should recycle this blob, otherwise it
will be dropped.
* fix poll_gossip_for_leader() loop to actually wait
for 30 seconds
* reduce reuseaddr use to only when necessary,
try to avoid already bound sockets
* move nat.rs to netutil.rs
* add gossip tracing to thin_client and bench-tps
* Migrate Budget DSL to use the Account state instead of global bank data structures.
* Serialize Instruction into Transaction::userdata.
* Store the pending set in the Account::userdata
* Enforce the token balance rules on contract execution. This becomes the entry point for generic contracts.
* This pr will have a performance impact on the bank. The next set of changes will fix this by locking each account during multi threaded execution of all the contracts.
* With this change a contract transaction needs to store its state under an address. That address could be the destination of the tokens, or any random address. For the latter, an extra step would be needed to claim the tokens which isn't implemented by budget_dsl at the moment.
* test tracking issue 1157
* remove client.sh from snap
* default to ephemeral instead of ~/.config key
* rework CLI for bench-tps
* remote multinode-demo stuff from remote-client.sh
* remove multinode-demo from remote-sanity and localnet-sanity
It needed to be passed the lock before, because it contained a
branch where one side didn't require locking. Now that that
defensive programming was hoisted, we can hoist the write lock
as well, leaving a simpler function for unit testing.
Even if transactions are dropped, accounts will have buffer
of tokens. Should reduce or eliminate AccountNotFound errors seen in the
leader while bench-tps is running.
* Reuse UDP port and open multiple sockets for transaction address
* Fixed failing crdt tests
* Add tests for reusing UDP ports
* Address review comments
* Updated bench-streamer to use multiple receive sockets
* Fix minimum number of recv sockets for bench-streamer
* Address review comments
Fixes#1132
* Moved bind_to function to nat.rs
* use USER instead of whoami
make gcloud_FigureRemoteUsername robust against unsolicited output
(that I get on login ;) )
validate --prefix argument
* Update gcloud.sh
If balance goes to 0, then bank removes the account
from it's account table and returns no account error. Thin client
should also update the account to this state or it will
still have the cached balance from the last successful get_balance().
* rename NodeInfo field of Node from "data" to "info"
(touches a lot of files)
* update client to use gossip to find leader, a la drone
* rework multinode scripts
* move more stuff into rust
* added usage to all
* no more rsync unless you're a validator (TODO: whack that, too)
* fullnode doesn't bail if drone isn't up yet, just keeps trying
* drone doesn't bail if network isn't up yet, just keeps trying
* remove trailing whitespace in ci/audit.sh
* code review fixups
* rename GOSSIP_PORT_RANGE => SOLANA_PORT_RANGE
* remove out-of-date TODO in localnet-sanity.sh
* remove features=test and code that was using it (localhost prohibitions in
crdt) added TODO in crdt.rs, maybe we should boot localhost in production
networks?
* boot tvu_window from NodeInfo: instead, send repair requests from the repair
socket (to gossip on peer) and answer repair requests via the sockaddr
from the repair request
* remove various unused pub functions
* banish SocketAddr parse().unwrap() to a macro that can also accept simpler stuff
* move gossip/NCP off assuming anything about its address
* use a single socket to send and receive gossip
* remove --addr/-a from CLIs
* rearrange networking utility code
* use Arc<UdpSocket> to share the Sync-safe UdpSocket among threads
* rename TestNode to Node
TODO:
* re-enable 127.0.0.1 as a valid address in crdt
* change repair request/response to a similar, single socket
* pick cloned sockets or Arc<UdpSocket> for all these (rpu uses tryclone())
* update contact_info with network truthiness instead of what the node
says?
And update transaction offsets to use the same approach as packet.rs.
Maybe this should be serialized_size(), but thanks to this
GenericArray update, those values are the same.
* Added poll_balance_with_timeout method
- updated bench-tps, fullnode and wallet to use this method instead
of repeatedly calling poll_get_balance()
* Address review comments
- Revert some changes to use wrapper poll_get_balance()
* Reverting bench-tps to use poll_get_balance
- The original code is checking if the balance has been updated,
instead of just retrieving the balance. The logic is different
than poll_balance_with_timeout()
* Reverting wallet to use poll_get_balance
- The break condition in the loop is different than poll_balance_with_timeout().
It's checking if the balance has been updated.
Situation is there can be that there can be bad entries in
the bench-tps CRDT table until they get purged later. Threads however
are created for those bad entries and then will hang on trying
to get the transaction_count from those bad addresses and never end.
* Revert benchmarks back to libtest
Criterion has too many dependencies, it's execution as slower, and
we didn't see the kind of precision we had hoped for to use it to
block CI builds.
* Ignore benchmarks that take more than a few milliseconds per iteration
* Revert "Ignore benchmarks that take more than a few milliseconds per iteration"
This reverts commit b87cdf6ef4.
* Don't run benchmarks in CI
They are already built in the nightly build. Executing them in CI
doesn't add much value until the results are precise enough to act
on.
# This is the 1st commit message:
Fix tesetment readme
# This is the commit message #2:
updte
# This is the commit message #3:
typo
# This is the commit message #4:
cleanup
comments
fixups!
fixups!
fixups for a real Result<> from get_balance()
on 2nd thought, be more rigorous
Merge branch 'rob-solana-accounts_with_state' into accounts_with_state
update
review comments
comments
get rid of option
- Some nodes don't have leader information while leader is broadcasting
blobs to those nodes. Such blobs are not retransmitted. This change
rertansmits the blobs once the leader's identity is know.
* add json, which does the thing with json, move print to Rust's {:?}
* add --head NUM, to limit how much work gets done for print, json, verify
* add verify-internal, which very carefully checks ledger format without
trying first to "recover" it
* exit with errors on mis-usage
We'll avoid introducing three-letter terms to free up the namespace
for three-letter acronyms.
But recognize the term "sigverify", a verb, to verify a digital
signature.
clear flags on fresh blobs, lest they sometimes impersonate coding blobs...
fix bug: advance *received whether the blob_index is in the window or not,
failure to do so results in a stalled repair request pipeline
bank will only register ids when has_more is not set because those are
the only ids it has advertised, so it will not register all ids,
however the entry stream will contain unbroken last_id chain, so we
need to track that to get the correct start hash.
All CLI apps that use clap (in other words, except for bench-streamer)
can use crate_version! to take the version from Cargo.toml.
This change addresses #700.
Don't recover() for copy(), as copy() is already tolerant of things
recover() guards against. Note: recover() is problematic if the ledger is
"live", i.e. is currently being written to.
* add recover_ledger() to deal with expected common ledger corruptions
* add verify_ledger() for future use cases (ledger-tool)
* increase ledger testing
* allow replicate stage to run without a ledger
* ledger-tool to output valid json
This was introduced to mask the occasional failure of racy tests. But this is misguided as it helps hid the true problem, the racy test, and it causes tries builds that fail deterministically to retry only to fail once again.
* fixup!
* fixups!
* send the vote and count it
* actually vote
* test
* Spelling fixes
* Process the voting transaction in the leader's bank
* Send tokens to the leader
* Give leader tokens in more cases
* Test for write_stage::leader_vote
* Request airdrop inside fullnode and not the script
* Change readme to indicate that drone should be up before leader
And start drone before leader in snap scripts
* Rename _kp => _keypair for keypairs and other review fixups
* Remove empty else
* tweak test_leader_vote numbers to be closer to testing 2/3 boundary
* combine creating blob and transaction for leader/validator
- No sigverify if feature sigverify_cpu_disable is used
- Purge validators in the test if lag count increases beyond
SOLANA_DYNAMIC_NODES_PURGE_LAG environment variable
- Other useful log messages in the test
While polling for a non-zero balance, it's not uncommon for one of the
get_balance requests to fail with EWOULDBLOCK. Previously when a get_balance
request failure occurred on the last iteration of the polling loop,
poll_get_balance returned an error even though the N-1 iterations may have
successfully retrieved a balance of 0.
erasure coding blobs were being counted as window slots, skewing transmit_index
erasure coding blobs were being skipped over for broadcast, because they're
only generated when the last data blob in an erasure block is added to the
window.... rewind the index to pick up and broadcast those coding blobs
- Create a known_hosts file if it doesn't exist
Otherwise ssh-keygen exits
- Move some common rsync code to common_start_setup
- Build the project before deploying it
Clippy told us to change function parameters to references, but
wasn't able to then tell us that the clone() before borrowing
was superfluous. This patch removes those by hand.
No expectation of a performance improvement here, since we were
just cloning reference counts. Just removes a bunch of noise.
* log responder error to warn
* log responder error to warn
* fixup!
* fixed assert
* fixed bad ports issue
* comments
* test for dummy address in Crdt::new instaad of NodeInfo::new
* return error if ContactInfo supplied to Crdt::new cannot be used to connect to network
* comments
Keep stdout clean for the actual program. This is a specific concern for the
wallet command, where there exists tests that capture stdout from the wallet to
confirm transactions.
- This script outputs the IP address array that can be used
with start_nodes script to launch multinode demo
- Changes to start_nodes to compress files for rsync
wip
voting
wip
move voting into the replicate stage
update
fixup!
fixup!
fixup!
fixup!
fixup!
fixup!
fixup!
fixup!
fixup!
fixup!
update
fixup!
fixup!
fixup!
tpu processing votes in entries before write stage
fixup!
fixup!
txs
make sure validators have an account
fixup!
fixup!
fixup!
exit fullnode correctly
exit on exit not err
try 50
add delay for voting
300
300
startup logs
par start
100
no rayon
retry longer
log leader drop
fix distance
50 nodes
100
handle deserialize error
update
fix broadcast
new table every time
tweaks
table
update
try shuffle table
skip kill
skip add
purge test
fixed tests
rebase 2
fixed tests
fixed rebase
cleanup
ok for blobs to be longer then window
fix init window
60 nodes
Per @aeyakovenko, contracts shouldn't trust the network for
timestamps. Instead, pass the verified public key to the
contract and let it decide if that's a public key it wants
to trust the timestamp from.
Fixes#405
Error handling is still clumsy. We should switch to something like
`error-chain` or `Result<T, Box<Error>>`, but until then, we can
at least be consistent across modules.
This also fixes a bug in the thin client where a nonexistent account
would have triggered a panic because we were using `balances[k]` instead
of `balances.get(key)`.
Fixes#534
Reaching into the stages' structs for their receivers is, in hindsight,
more awkward than returning multiple values from constructors. By
returning the receiver, the caller can name the receiver whatever it
wants (as you would with any return value), and doesn't need to
reach into the struct for the field (which is super awkward in
combination with move semantics).
The new read_entries() works, but is overly-contrained. Not
using that function yet, but adding it here in the hopes some
Rust guru will tell us how to get that lifetime constraint out
of there.
Fixes#517
1. Use pre-installed host rust toolchain
2. Build reference/performance fullnode in same part to avoid rebuilding libraries
3. Merge scripts into same part
UPnP is now used to request a port on the NAT be forwarded to the local machine.
This obviously only works for NATs that support UPnP, and thus is not a panacea
for all NAT-related connectivity issues.
Notable hacks in this patch include a transmit/receive UDP socket pair to work
around current protocol limitations whereby the full node assumes its peer can
receive on the same UDP port it transmitted from.
* The built code is loaded to the nodes
* ssh_keys can be copied to the nodes for internode comm
* The nodes are started with their respective roles
* The client demo is started on the last node
Otherwise we will just warn about overrun and not insert new blob
Also, break if the index we find is less than consumed otherwise
we can infinite loop
Instead of asserting ref count is 1 before recycling, allow users
to recycle items early. If it turns out that was too early, and
allocate() wants to return it, then boot it and take a memory
allocation performance hit instead.
* Send transactions from the mint's private key
* By default, send full balance to oneself
* By default, request the mint's number of tokens for airdrops
Now that the Bank is single-threaded again, we can spin up new
accounts on the fly without concern of thread contention. Likewise,
we can send all transactions from a single account, which was also
problematic in the multi-threaded bank. Sending from one account will
also make client-demo straightforward to port to solana-drone.
This change will make these benchmarks way slower, because its now
cloning the transaction vector each iteration instead of the ref
counts. We need to rethink these.
This patch likely fixes the sporadic failures in the following tests:
```
test server::tests::validator_exit ... FAILED
test streamer::test::streamer_send_test ... FAILED
test thin_client::tests::test_bad_sig ... FAILED
test drone::tests::test_send_airdrop ... FAILED
test thin_client::tests::test_thin_client ... FAILED
```
This might fix an awful bug where the streamer reuses a Blob
before the current user is done with it. Recycler should probably
assert ref count is one?
* Also don't collect() an iterator before iterating over it.
pnet_transport takes a long time to build. It's been especially
painful from within a docker container for reasons I don't care
to understand. pnet_datalink is the only part of pnet we're using
so booting the rest.
IPADDR is simple, but not exactly what we need for testnet, where NAT'd
folks need to join in, need to advertize themselves as on the interweb.
myip() helps, but there's some TODOs: fullnode-config probably needs to
be told where it lives in the real world (machine interfaces tell us dick),
or incorporate something like the "ifconfig.co" code in myip.sh
* limit the number of Tntries per Blob to at most one
* limit the number of Transactions per Entry such that an Entry will
always fit in a Blob
With a one-to-one map of Entries to Blobs, recovery of a validator
is a simple fast-forward from the end of the initial genesis.log
and tx-*.logs Entries.
TODO: initialize validators' blob index with initial # of Entries.
* add setup to combine init steps, configurable initial mint
* bash -e -> bash and be explicit about errors with || exit $?
* feed transaction logs to validator, too
* use a line iterator on stdin instead of a line iterator on a buffer
* move some unwrap() to expect(), documenting failures
* bind entry type earlier (for kicks)
* fix documentation, other opt parameters
* add support for a named output file, remove hardcoded "leader.log"
* resurrect stdout as the default output
Refactored TVU, into stages
* blob fetch stage for blobs
* window stage for maintaining the blob window
* pulled out NCP out of the TVU so they can be separate units
TVU is now just the fetch -> window -> request and bank processing
* Needed for #341. Create a dummy entry with public key 0..., but with a valid gossip address that we can ask for updates. This will allow validators to discover the full network by just knowing a single node's gossip address without knowing anything else about their identity.
* once we start removing dead validators this entry should get purged since we will never see a message from public key 0, #344
It's not always a problem if line coverage drops. For example,
coverage will drop if you make well-tested code more succinct.
It just means the uncovered code is just a larger percentage of
the codebase.
Because we keep changing those scripts and not updating the readme.
Also, this removes the "-b 9000" starting validators. Is that right?
Or should we be passing that to the validator config?
The first header and its content is the only text displayed on
GitHub's mobile page. Reorder so that the disclaimer is the only
information people see.
Disclaimer: IANAL and assume reordering these doesn't matter. :)
We added them thinking it'd be a good stepping stone towards an
asynchronous thin client, but it's used inconsistently and where
it used, the function is still synchronous, which is just confusing.
Solana™ is a new blockchain architecture built from the ground up for scale. The architecture supports
up to 710 thousand transactions per second on a gigabit network.
Disclaimer
===
All claims, content, designs, algorithms, estimates, roadmaps, specifications, and performance measurements described in this project are done with the author's best effort. It is up to the reader to check and validate their accuracy and truthfulness. Furthermore nothing in this project constitutes a solicitation for investment.
Solana: High Performance Blockchain
===
Solana™ is a new architecture for a high performance blockchain. It aims to support
over 700 thousand transactions per second on a gigabit network.
Introduction
===
It's possible for a centralized database to process 710,000 transactions per second on a standard gigabit network if the transactions are, on average, no more than 178 bytes. A centralized database can also replicate itself and maintain high availability without significantly compromising that transaction rate using the distributed system technique known as Optimistic Concurrency Control [H.T.Kung, J.T.Robinson (1981)]. At Solana, we're demonstrating that these same theoretical limits apply just as well to blockchain on an adversarial network. The key ingredient? Finding a way to share time when nodes can't trust one-another. Once nodes can trust time, suddenly ~40 years of distributed systems research becomes applicable to blockchain! Furthermore, and much to our surprise, it can implemented using a mechanism that has existed in Bitcoin since day one. The Bitcoin feature is called nLocktime and it can be used to postdate transactions using block height instead of a timestamp. As a Bitcoin client, you'd use block height instead of a timestamp if you don't trust the network. Block height turns out to be an instance of what's being called a Verifiable Delay Function in cryptography circles. It's a cryptographically secure way to say time has passed. In Solana, we use a far more granular verifiable delay function, a SHA 256 hash chain, to checkpoint the ledger and coordinate consensus. With it, we implement Optimistic Concurrency Control and are now well in route towards that theoretical limit of 710,000 transactions per second.
It's possible for a centralized database to process 710,000 transactions per second on a standard gigabit network if the transactions are, on average, no more than 176 bytes. A centralized database can also replicate itself and maintain high availability without significantly compromising that transaction rate using the distributed system technique known as Optimistic Concurrency Control [\[H.T.Kung, J.T.Robinson (1981)\]](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.65.4735). At Solana, we're demonstrating that these same theoretical limits apply just as well to blockchain on an adversarial network. The key ingredient? Finding a way to share time when nodes can't trust one-another. Once nodes can trust time, suddenly ~40 years of distributed systems research becomes applicable to blockchain!
Running the demo
> Perhaps the most striking difference between algorithms obtained by our method and ones based upon timeout is that using timeout produces a traditional distributed algorithm in which the processes operate asynchronously, while our method produces a globally synchronous one in which every process does the same thing at (approximately) the same time. Our method seems to contradict the whole purpose of distributed processing, which is to permit different processes to operate independently and perform different functions. However, if a distributed system is really a single system, then the processes must be synchronized in some way. Conceptually, the easiest way to synchronize processes is to get them all to do the same thing at the same time. Therefore, our method is used to implement a kernel that performs the necessary synchronization--for example, making sure that two different processes do not try to modify a file at the same time. Processes might spend only a small fraction of their time executing the synchronizing kernel; the rest of the time, they can operate independently--e.g., accessing different files. This is an approach we have advocated even when fault-tolerance is not required. The method's basic simplicity makes it easier to understand the precise properties of a system, which is crucial if one is to know just how fault-tolerant the system is. [\[L.Lamport (1984)\]](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1078)
Furthermore, and much to our surprise, it can be implemented using a mechanism that has existed in Bitcoin since day one. The Bitcoin feature is called nLocktime and it can be used to postdate transactions using block height instead of a timestamp. As a Bitcoin client, you'd use block height instead of a timestamp if you don't trust the network. Block height turns out to be an instance of what's being called a Verifiable Delay Function in cryptography circles. It's a cryptographically secure way to say time has passed. In Solana, we use a far more granular verifiable delay function, a SHA 256 hash chain, to checkpoint the ledger and coordinate consensus. With it, we implement Optimistic Concurrency Control and are now well en route towards that theoretical limit of 710,000 transactions per second.
Testnet Demos
===
The Solana repo contains all the scripts you might need to spin up your own
local testnet. Depending on what you're looking to achieve, you may want to
run a different variation, as the full-fledged, performance-enhanced
multinode testnet is considerably more complex to set up than a Rust-only,
singlenode testnode. If you are looking to develop high-level features, such
as experimenting with smart contracts, save yourself some setup headaches and
stick to the Rust-only singlenode demo. If you're doing performance optimization
of the transaction pipeline, consider the enhanced singlenode demo. If you're
doing consensus work, you'll need at least a Rust-only multinode demo. If you want
to reproduce our TPS metrics, run the enhanced multinode demo.
For all four variations, you'd need the latest Rust toolchain and the Solana
Before you start the server, make sure you know the IP address of the machine ou want to be the leader for the demo, and make sure that udp ports 8000-10000 are open on all the machines you wan to test with. Now you can start the server:
Configuration Setup
---
The network is initialized with a genesis ledger and leader/validator configuration files.
These files can be generated by running the following script.
```bash
$ ./multinode-demo/setup.sh
```
Drone
---
In order for the leader, client and validators to work, we'll need to
spin up a drone to give out some test tokens. The drone delivers Milton
Friedman-style "air drops" (free tokens to requesting clients) to be used in
test transactions.
Start the drone on the leader node with:
```bash
$ ./multinode-demo/drone.sh
```
Singlenode Testnet
---
Before you start a fullnode, make sure you know the IP address of the machine you
want to be the leader for the demo, and make sure that udp ports 8000-10000 are
open on all the machines you want to test with.
Now start the server in a separate shell:
```bash
$ cat ./multinode-demo/leader.sh
#!/bin/bash
exportRUST_LOG=solana=info
sudo sysctl -w net.core.rmem_max=26214400
cat genesis.log leader.log | cargo run --release --features cuda --bin solana-fullnode -- -s leader.json -l leader.json -b 8000 -d 2>&1| tee leader-tee.log
$ ./multinode-demo/leader.sh
```
Wait a few seconds for the server to initialize. It will print "Ready." when it's safe
to start sending it transactions.
Wait a few seconds for the server to initialize. It will print "leader ready..." when it's ready to
receive transactions. The leader will request some tokens from the drone if it doesn't have any.
The drone does not need to be running for subsequent leader starts.
Now you can start some validators:
Multinode Testnet
---
To run a multinode testnet, after starting a leader node, spin up some validator nodes in
You can observe the effects of your client's transactions on our [dashboard](https://metrics.solana.com:3000/d/testnet/testnet-hud?orgId=2&from=now-30m&to=now&refresh=5s&var-testnet=testnet)
Linux Snap
---
A Linux [Snap](https://snapcraft.io/) is available, which can be used to
easily get Solana running on supported Linux systems without building anything
from source. The `edge` Snap channel is updated daily with the latest
development from the `master` branch. To install:
```bash
$ sudo snap install solana --edge --devmode
```
(`--devmode` flag is required only for `solana.fullnode-cuda`)
Once installed the usual Solana programs will be available as `solona.*` instead
of `solana-*`. For example, `solana.fullnode` instead of `solana-fullnode`.
Update to the latest version at any time with:
```bash
$ snap info solana
$ sudo snap refresh solana --devmode
```
### Daemon support
The snap supports running a leader, validator or leader+drone node as a system
daemon.
Run `sudo snap get solana` to view the current daemon configuration. To view
daemon logs:
1. Run `sudo snap logs -n=all solana` to view the daemon initialization log
2. Runtime logging can be found under `/var/snap/solana/current/leader/`,
`/var/snap/solana/current/validator/`, or `/var/snap/solana/current/drone/` depending
on which `mode=` was selected. Within each log directory the file `current`
contains the latest log, and the files `*.s` (if present) contain older rotated
logs.
Disable the daemon at any time by running:
```bash
$ sudo snap set solana mode=
```
Runtime configuration files for the daemon can be found in
`/var/snap/solana/current/config`.
#### Leader daemon
```bash
$ sudo snap set solana mode=leader
```
If CUDA is available:
```bash
$ sudo snap set solana mode=leader enable-cuda=1
```
`rsync` must be configured and running on the leader.
1. Ensure rsync is installed with `sudo apt-get -y install rsync`
2. Edit `/etc/rsyncd.conf` to include the following
```
[config]
path = /var/snap/solana/current/config
hosts allow = *
read only = true
```
3. Run `sudo systemctl enable rsync; sudo systemctl start rsync`
4. Test by running `rsync -Pzravv rsync://<ip-address-of-leader>/config
solana-config` from another machine. **If the leader is running on a cloud
provider it may be necessary to configure the Firewall rules to permit ingress
to port tcp:873, tcp:9900 and the port range udp:8000-udp:10000**
To run both the Leader and Drone:
```bash
$ sudo snap set solana mode=leader+drone
```
#### Validator daemon
```bash
$ sudo snap set solana mode=validator
```
If CUDA is available:
```bash
$ sudo snap set solana mode=validator enable-cuda=1
```
By default the validator will connect to **testnet.solana.com**, override
the leader IP address by running:
```bash
$ sudo snap set solana mode=validator leader-address=127.0.0.1 #<-- change IP address
```
It's assumed that the leader will be running `rsync` configured as described in
the previous **Leader daemon** section.
Developing
===
@ -107,12 +276,18 @@ $ source $HOME/.cargo/env
$ rustup component add rustfmt-preview
```
If your rustc version is lower than 1.25.0, please update it:
If your rustc version is lower than 1.26.1, please update it:
```bash
$ rustup update
```
On Linux systems you may need to install libssl-dev, pkg-config, zlib1g-dev, etc. On Ubuntu:
Solana uses a channel-oriented, date-based branching process described [here](https://github.com/solana-labs/solana/blob/master/rfcs/rfc-005-branches-tags-and-channels.md).
## Release Steps
### Changing channels
When cutting a new channel branch these pre-steps are required:
1. Pick your branch point for release on master.
2. Create the branch. The name should be "v" + the first 2 "version" fields from Cargo.toml. For example, a Cargo.toml with version = "0.9.0" implies the next branch name is "v0.9".
3. Update Cargo.toml to the next semantic version (e.g. 0.9.0 -> 0.10.0) by running `./scripts/increment-cargo-version.sh`.
4. Push your new branch to solana.git
5. Land your Cargo.toml change as a master PR.
At this point, ci/channel-info.sh should show your freshly cut release branch as "BETA_CHANNEL" and the previous release branch as "STABLE_CHANNEL".
### Updating channels (i.e. "making a release")
We use [github's Releases UI](https://github.com/solana-labs/solana/releases) for tagging a release.
1. Go [there ;)](https://github.com/solana-labs/solana/releases).
2. Click "Draft new release".
3. If the first major release on the branch (e.g. v0.8.0), paste in [this template](https://raw.githubusercontent.com/solana-labs/solana/master/.github/RELEASE_TEMPLATE.md) and fill it in.
4. Test the release by generating a tag using semver's rules. First try at a release should be <branchname>.X-rc.0.
5. Verify release automation:
1. [Crates.io](https://crates.io/crates/solana) should have an updated Solana version.
2. ...
6. After testnet deployment, verify that testnets are running correct software. http://metrics.solana.com should show testnet running on a hash from your newly created branch.
Solana nodes accept HTTP requests using the [JSON-RPC 2.0](https://www.jsonrpc.org/specification) specification.
To interact with a Solana node inside a JavaScript application, use the [solana-web3.js](https://github.com/solana-labs/solana-web3.js) library, which gives a convenient interface for the RPC methods.
Returns the status of a given signature. This method is similar to
[confirmTransaction](#confirmtransaction) but provides more resolution for error
events.
##### Parameters:
*`string` - Signature of Transaction to confirm, as base-58 encoded string
##### Results:
*`string` - Transaction status:
*`Confirmed` - Transaction was successful
*`SignatureNotFound` - Unknown transaction
*`ProgramRuntimeError` - An error occurred in the program that processed this Transaction
*`AccountInUse` - Another Transaction had a write lock one of the Accounts specified in this Transaction. The Transaction may succeed if retried
*`GenericFailure` - Some other error occurred. **Note**: In the future new Transaction statuses may be added to this list. It's safe to assume that all new statuses will be more specific error conditions that previously presented as `GenericFailure`
This directory contains scripts useful for working with a test network. It's
intended to be both dev and CD friendly.
### User Account Prerequisites
GCP and AWS are supported.
#### GCP
First authenticate with
```bash
$ gcloud auth login
```
#### AWS
Obtain your credentials from the AWS IAM Console and configure the AWS CLI with
```bash
$ aws configure
```
More information on AWS CLI configuration can be found [here](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#cli-quick-configuration)
### Metrics configuration
Ensure that `$(whoami)` is the name of an InfluxDB user account with enough
access to create a new InfluxDB database. Ask mvines@ for help if needed.
## Quick Start
NOTE: This example uses GCP. If you are using AWS, replace `./gce.sh` with
`./ec2.sh` in the commands.
```bash
$ cd net/
$ ./gce.sh create -n 5 -c 1#<-- Create a GCE testnet with 5 validators, 1 client (billing starts here)
$ ./init-metrics.sh $(whoami)#<-- Configure a metrics database for the testnet
$ ./net.sh start #<-- Deploy the network from the local workspace
$ ./ssh.sh #<-- Details on how to ssh into any testnet node
$ ./gce.sh delete #<-- Dispose of the network (billing stops here)
```
## Tips
### Running the network over public IP addresses
By default private IP addresses are used with all instances in the same
availability zone to avoid GCE network engress charges. However to run the
network over public IP addresses:
```bash
$ ./gce.sh create -P ...
```
or
```bash
$ ./ec2.sh create -P ...
```
### Deploying a Snap-based network
To deploy the latest pre-built `edge` channel Snap (ie, latest from the `master`
branch), once the testnet has been created run:
```bash
$ ./net.sh start -s edge
```
### Enabling CUDA
First ensure the network instances are created with GPU enabled:
```bash
$ ./gce.sh create -g ...
```
or
```bash
$ ./ec2.sh create -g ...
```
If deploying a Snap-based network nothing further is required, as GPU presence
is detected at runtime and the CUDA build is auto selected.
If deploying a locally-built network, first run `./fetch-perf-libs.sh` then
ensure the `cuda` feature is specified at network start:
```bash
$ ./net.sh start -f "cuda,erasure"
```
### How to interact with a CD testnet deployed by ci/testnet-deploy.sh
**AWS-Specific Extra Setup**: Follow the steps in `scripts/add-solana-user-authorized_keys.sh`,
then redeploy the testnet before continuing in this section.
Taking **master-testnet-solana-com** as an example, configure your workspace for
the testnet using:
```bash
$ ./gce.sh config -p master-testnet-solana-com
```
or
```bash
$ ./ec2.sh config -p master-testnet-solana-com
```
Then run the following for details on how to ssh into any testnet node
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.